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PREFACE 



Hyperspectral imaging is an emerging technique in remote sensing data processing 
that expands and improves capability of multispectral image analysis. It takes advantage 
of hundreds of contiguous spectral channels to uncover materials that usually cannot be 
resolved by multispectral sensors. This book is an outgrowth of my research in 
hyperspectral image processing and personal communications in response to many people 
who are interested in my work previously published in various journals. At the first 
glimpse, this book may look like a collection of papers authored and co-authored by me. 
As a matter of fact, it is not the case. The book has been organized in a way that all the 
chapters are logically connected and can be referred back and forth one another for more 
details. In particular, most of computer simulations and experiments have been reworked 
out in order to have a consistent treatment throughout the book. The title of 
"Hyperspectral Imaging: Techniques for Spectral Detection and Classification" is used to 
reflect its focus on spectral techniques, i.e. non-literal techniques that are especially 
designed and developed for hyperspectral imagery rather than multispectral imagery. 
Although many techniques already exist in multispectral image processing, some of them 
may not be effective when they are directly applied to hyperspectral imagery. This book 
takes an opposite approach to develop techniques from a hyperspectral imagery viewpoint 
where noise is generally not Gaussian and interference plays a more dominant role than 
does noise in hyperspectral image analysis. More importantly, the detection and 
classification is performed and carried out by targets of interest rather than pattern classes. 

A significant difference from other books is that this book explores applications of 
statistical signal processing techniques in hyperspectral image analysis, specifically, 
subpixel detection and mixed pixel classification. It includes many techniques developed 
in my lab with my former and current Ph.D. students, and systematically integrates these 
techniques in such a unified framework that readers can capture how the ideas were 
developed and evolved. Since many readers whose background is not engineering may 
find a gap in understanding the concepts presented in this book, another objective of this 
book is to make it self-contained so that readers can easily pick up and implement the 
techniques without much difficulty. In doing so, I have included detailed mathematical 
derivations and experiments for illustration. Nevertheless, it by no means claims to be 
comprehensive; rather, it can be viewed as a recipe book that offers various techniques for 
hyperspectral data exploitation. Some of these techniques such as OSP (Orthogonal 
Subspace Projection), CEM (Constrained Energy Minimization) are mature for practical 
implementation. They are treated in the book in great detail. In addition, many 
techniques developed in this book may also become handy for years to come. Due to 
limited scope of the book, many well-known techniques, such as linear spectral mixmre 
analysis that can be found in numerous references, will not discussed in this book. 
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Instead, this book only covers the work that has been done over the years in the Remote 
Sensing Signal and Image Processing Laboratory (RSSIPL) at the University of 
Maryland, Baltimore County. 

Like most books, this book owes much credit to many people who deserve my 
sincere gratitude. These individuals are my former Ph.D. students, Drs. Mark L.G. 
Althouse, Clark Brumbley, Shao-Shan Chiang, Qian Du, Joseph C. Harsanyi, Daniel 
Heinz, Agustin Ifarragaerri, Chien-Shun Lo, Hsuan Ren, Chuin-Mu Wang and Ching- 
Wen Yang as well as my current Ph.D. students, Ms. Eliza Yingzi Du, Ms. Kerri 
Guilfoyle and Ms. Jianwei Wang. Specifically, I would like to thank Drs. Shao-Shan 
Chiang and Hsuan Ren who spent so much time in helping me generate most of figures 
in this book. This book cannot be completed without their contributions. For the data 
used in this book 1 would like to thank the Spectral Information Technology 
Applications Center (SIT AC) and Dr. Harsanyi who provide me with their HYDICE and 
AVIRIS data respectively. My sincere thanks also go to Mr. Paul Lewis and Ms. Judy 
Powelson who read part of this book and provided valuable suggestions. 

Finally, I would like to thank Mr. James Buss, Dr. Irving W. Ginsberg, Mr. Paul 
Lewis and Dr. Gregory Palvin for their kind support of my research under funding 
received from the Office of Naval Research, the Bechtel Nevada Corporation through the 
Department of Energy and Spectral Information Technology Applications Center 
(SITAC). Last but not least, I would like to particularly thank Dr. James O. Jensen for 
his enthusiastic support of an NRC (National Research Council) Senior Research 
Associateship award that I received from the US Army Soldier and Biological 
Command, Edgewood Chemical and Biological Center (ECBC), Aberdeen Proving 
Ground, MD. This timely support allows me to take advantage of my sabbatical leave to 
complete this book. 
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INTRODUCTION 



Hyperspectral imaging is a fast growing area in remote sensing. It expands and 
improves capability of multispectral image analysis. Two hyperspectral sensors currently 
in use and operated in airborne platform are 224-band AVIRIS (Airborne Visible/Inffared 
Imaging Spectrometer) and 210-band HYperspectral Digital Imagery Collection 
Experiment (HYDICE). They take advantage of hundreds of contiguous spectral channels 
to uncover materials that usually cannot be resolved by multispectral sensors. However, 
this advantage also comes with a price that many unknown signal sources may be also 
extracted by the sensors with no prior knowledge. In particular, these signal sources may 
include targets with size smaller than the ground sampling distance (GSD), which are 
generally embedded in a single pixel and cannot be identified by visual inspection. Their 
presence can be only measured by their spectral properties. Under these circumstances 
detection of such small targets cannot be accomplished by classical spatial-based image 
processing techniques. Instead, it must be carried out at subpixel level. Therefore, one of 
great challenges for hyperspectral imaging is subpixel detection, which is not treated in 
standard spatial-based image processing. After a target is detected, the next step is to 
classify detected targets according to their spatial or spectral properties. However, due to 
the high spectral resolution of a hyperspectral sensor and a large spatial coverage by 
GSD, it is often the case that more than one material substance will be present in a single 
pixel. In this case, a pixel may contain two or more material substances; thus it is no 
longer a pure pixel. To deal with such a pixel effectively, the pixel must be considered as 
a mixed pixel wherein several substances are present. This mixing activity further 
complicates image classification since traditional pure pixel-based classification 
techniques may not be applicable or effective even if they can be applied. Therefore, 
another challenging problem is to develop effective techniques for mixed pixel 
classification. This book is particularly written to aim at these two areas, subpixel 
detection and mixed pixel classification. Specifically, it is focused on problem solving 
techniques rather than theoretical treatment. Most techniques developed in this book are 
based on engineering perspectives and derived from aspects of statistical signal 
processing. They will be presented in a unifying framework so that readers can follow the 
flow, easily grasp the ideas and get hands-on algorithmic implementation. 
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1.1 BACKGROUND 

The topics discussed in the book are related to target detection and classification 
rather than pattern classification as commonly treated in Duda and Hart (1973). The 
targets referred here are primarily man-made targets or targets whose signatures aie 
spectrally distinct from image background. For this reason, the terminology of target 
classification used in this book is very different from land-cover classification generally 
considered in remote sensing literature. In pattern classification a classifier must classify 
image data into a number of pattern classes, which also include background classes. 
Although the background knowledge may be obtained directly from the image data in an 
unsupervised means, it may not be accurate. In some cases, it may not be reliable, 
particularly, when the targets are relatively small or when the image background is 
complicated due to the fact that many unknown signal sources can be uncovered by high- 
resolution sensors. Besides, image background generally varies and is difficult to 
characterize. As a result, it is nearly impossible to classify image background without 
complete prior knowledge. On the other hand, in target classification we are generally 
interested in classification of targets rather than image background. In many situations, 
we may have prior knowledge about targets that we would like to classify. Under this 
circumstance, we perform target classification with no need of background knowledge. 
Additionally, we also make a distinction among detection, classification, discrimination, 
identification and quantification. For instance, target detection does not necessarily imply 
target classification, neither does target classification imply target discrimination. A 
target detector, which achieves 100% detection may result in 0% classification since it 
may not be able to classify the targets it detected. One such detector is an anomaly 
detector, which can detect anomalous targets but may not be able to classify the detected 
anomalies. Similarly, a target classifier may classify targets correctly, but may fail to 
discriminate all the classified targets. For example, a target classifier can classify vehicles 
as wheeled and tracked vehicles, but it may be unable to differentiate a jeep from a truck. 
On the other hand, a target discriminator may discern one target from another, but does 
not necessarily classify these targets. However, in some applications, researchers may 
consider target discrimination as a follow-up operation of target classification. In this 
case, the target discrimination is only applied after targets are classified. A target 
identifier usually requires a database or spectral library to identify targets of interest. It 
performs more subtle functionality than detection, discrimination and classification. A 
target quantifier can perform quantification as a target is detected. The diagram depicted 
in Fig. 1.1 may help clarify these terminologies. Nevertheless, it should be noted that 
this differentiation is the author's preference and by no means a standard taxonomy. 
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As the book title implies, the primary goal of this book is to develop spectral 
techniques for subpixel detection and mixed pixel classification. In general, target 
detection and classification in remotely sensed images can be conducted spatially, 
spectrally or both. Here, we are specifically interested in non-literal (referred to as 
spectral) analysis techniques as opposed to literal (spatial-based) processing techniques. 
Non-literal exploitation is “the process of extracting non-spatial information from digital 
image data, automatically, semi-automatically, using nontraditional, advanced processing 
techniques which may employ models, measurements, signatures or other information to 
perform one or more of the following functions at the requisite levels of specificity: 
detect, geolocate, classify, discriminate, characterize, identify, quantify, point, track, 
predict, target or map and chart objects, emissions, activities or events of interest 
represented in the imagery” (HYMSMO, 1998). 



1.2 OUTLINE OF THE BOOK 

This book consists of 1 8 chapters. Chapters 2-7 on suhpixel detection. Chapters 8- 
17 on mixed pixel classification (MFC) and Chapter 18 on conclusions and further topics 
which are not treated in this book. Each chapter is accompanied by extensive experiments 
for illustration. Chapters 2-17 are further categorized into five parts. PART I is made up 
of only one chapter, i.e.. Chapter 2, which investigates several hyperspectral measures for 
spectral characterization. PART II is devoted to subpixel detection, which comprises of 
five chapters, Chapters 3-7, where a wide range of topics are covered; supervised 
unconstrained and target abundance-constrained subpixel detection in Chapter 3, 
supervised target signature- constrained subpixel detection in Chapter 4, unsupervised 
subpixel detection in Chapter 5, anomaly detection in Chapter 6 and detection sensitivity 
to different levels of target information and noise in Chapter 7. PART III consists of 2 
chapters. Chapters 8-9 on unconstrained linear mixture analysis-based MPC. PART IV 
extends PART III to constrained linear mixture analysis-based MPC where 3 chapters aie 
included, target abundance-constrained MPC in Chapter 10, target signature-constrained 
MPC in Chapters 11-12. PART V deals with automatic mixed pixel classification 
(AMPC) when no a priori target knowledge is available. Five chapters are covered in 
this part, unsupervised MPC in Chapter 13, anomaly classification in Chapter 14, linear 
spectral random mixture analysis in Chapter 15, projection pursuit in Chapter 16 and 
estimation of virtual dimensionality in Chapter 17. Finally, Chapter 18 concludes some 
topics that are not treated in this book. 

1.2.1 Stochastic Hyperspectral Measures 

A remotely sensed image is actually an image cube with the third dimension 
specified by spectral wavelengths. As a result, each image pixel is indeed a column 
vector, of which each component represents a particular spectral band. The spectral 
information contained in an image pixel vector generally cannot be explored by spatial- 
based pure pixel classification methods. The first part of this book, Chapter 2 considers 
some commonly used similarity measures in pattern classification and also develops 
several information theoretic criteria that can be used to design new hyperspectral 
measures. The concept of uncertainty is introduced to characterize spectral variations 
caused by unpredicted sources such as atmospheric and scattering effects,. With this 
interpretation, the well-established information theory can be readily applied to analysis 
of spectral properties. In particular, two information theoretic hyperspectral measures. 




4 



HYPERSPECTRAL IMAGING 



spectral information divergence (SID) and hidden Markov model-based information 
divergence (HMMID), can be derived from self-information for spectral similarity. They 
can be viewed as stochastic measures as opposed to deterministic measures commonly 
used in the literature such as spectral angle mapper (SAM), Euclidean distance. A 
functional taxonomy of the hyperspectral measures studied in this chapter is depicted in 
Fig. 1.2. 




Figure 1.2: Functional taxonomy of hyperspectral measures 

1.2.2 Subpixel Detection 

The subpixel detection presented in this book is referred to spectral target detection, 
i.e. non-literal and conducted on a single pixel basis. Three types of subpixel detection 
are considered, supervised subpixel detection, unsupervised subpixel detection and 
automatic subpixel detection. Supervised subpixel detection requires a priori target 
knowledge and can be implemented with or without constraints, whereas unsupervised 
subpixel detection does not require any prior target information where the needed target 
information can be extracted directly from the image data in an unsupervised manner. 
Unlike supervised and unsupervised subpixel detection, automatic subpixel detection, 
generally referred to as anomaly detection does not need target information at all. A 
functional taxonomy of subpixel detection is delineated in Fig. 1.3. 

In Chapter 3 we first start with the simplest subpixel detection technique, orthogonal 
subspace projection (OSP), which is supervised and unconstrained. From there we 
consider two supervised partially target abundance-constrained least-squares methods, 
sum-to-one constrained least-squares (SCLS) and non-negatively constrained least-squares 
(NCLS). Chapter 4 derives two supervised target signature-constrained methods, 
constrained energy minimization (CEM) and target-constrained interference-minimized 
filter (TCIMF). Because it is generally difficult to obtain a priori target knowledge for 
supervised subpixel detection methods, three unsupervised algorithms are introduced in 
Chapter 5, unsupervised vector quantization (UVQ) algorithm, unsupervised target 
generation process and least-squared error method. These allow us to extract necessary 
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target information directly from the image data, referred to as a posteriori target 
information to expand ability of the supervised subpixel detection techniques. Since the 
a posteriori target information may not be accurate, another type of subpixel detection, 
anomaly detection, is considered in Chapter 6, which does not require a priori or a 
posteriori target information for subpixel detection. It extracts targets whose spectral 
signatures are distinct from their surrounding pixels. Since the unsupervised subpixel 
detection is sensitive to the used target information, whereas anomaly detection is 
sensitive to noise, Chapter 7 is devoted to an analysis of sensitivity to the level of used 
target information and noise. 




Figure 1.3: Functional taxonomy of subpixel detection 



1.2.3 Mixed Pixel Classification (MPC) 

Classification in standard image processing is usually performed spatially on a pure 
pixel basis where objects of interest are classified based on their spatial properties. In a 
remote sensing image scene, the image pixel is generally composed of different material 
substances, which cannot be analyzed by spatial-based pure pixel classification methods. 
In this case, using spectral properties is an effective means to detect and classify the 
materials present in a single pixel. Mixed pixel classification (MPC) provides such a 
solution. A commonly used approach to MPC is linear unmixing. It models the 
spectrum of each image pixel vector as a linear mixture of spectra of targets that are 
assumed to be present in the image data. The classification is then performed based on 
estimated abundance fractions of target signatures. As a result, the images produced by 
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MFC are generally gray scale and represent target abundance fractions. The MFC is then 
carried out by visual inspection of the target abundance fractional images. This abundance 
estimation approach is very different from the classical class-membership labeling process 
carried out by pure pixel classification (FFC) where each pure pixel must be assigned to 
one and only one class. Compared to FFC, MFC estimates the abundance fraction of 
each possible target and uses it as a criterion for classification. In this case, a pixel can be 
classified by MFC into more than one class, a situation which never occurs in FFC. 
Furthermore, if all the target abundance fractions are non-negative, we can normalize 
these fractions to unity to form a probability vector. The resulting probabilities can be 
viewed as a posteriori probabilities that provide the likelihood of each target to be 
assigned to the pixel. With this interpretation, MFC can be thought of as an approach 
which produced the likelihood of each target to be detected in each image pixel, whereas 
FFC can be considered a method that thresholds the estimated target abundance fractions 
into either 1 or 0 where a “1” indicates a target detected and a “0” for target absence. In 
this book, different approaches to MFC are investigated and a functional taxonomy of 
MFC is given in Fig. 1.4. 




Figure 1.4: Functional taxonomy of mixed pixel classification 
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1.2. 3.1 Unconstrained MPC 

The simplest approach to MPC is linear spectral mixture analysis (LSMA), which 
has been widely used in the past. Due to mathematical tractability most LSMA-based 
techniques are supervised and unconstrained, meaning that the complete target knowledge 
must be given and no constraints are imposed on the linear mixture model used in 
LSMA. One such approach is the Gaussian maximum likelihood classification. It is an a 
priori approach, which assumes that noise in the linear mixture model is Gaussian so 
that a closed-form solution for MPC can be derived. Recently, an alternative approach, 
called orthogonal subspace projection (OSP) approach was developed and has received 
considerable interest in MPC. Unlike Gaussian maximum likelihood classification, the 
OSP only requires second-order statistics of noise and a posteriori target knowledge, 
which can be obtained from the image data. Chapter 8 considers a class of a posteriori 
OSP-based classifiers, which can be derived from the original OSP classifier developed 
by Harsanyi and Chang and can be considered as a priori OSP. Interestingly, it can be 
also shown that the a priori Gaussian maximum likelihood classifier is essentially an a 
posteriori OSP classifier. In order to evaluate the performance of MPC relative to that of 
PPC, Chapter 9 conducts a comparative analysis and quantitative study among several 
commonly used methods, specially a comparison between the OSP-based approach to 
MPC and Fisher’s linear discriminant analysis for PPC. Since MPC basically performs 
abundance estimation not the way the traditional PPC does, two mixed-to-pure pixel 
converters (MPCV) are also suggested to convert the abundance fractional images 
produced by MPC to binary images so that MPC can be reduced to PPC. As a result, 
MPC can be compared against PPC via MPCV. 

1.2. 3. 2 Constrained MPC 

One of main reasons that we are interested in unconstrained MPC is that it has 
closed-form solutions and is simple and easy to implement. Constrained MPC is 
considered to be more challenging because constraints generally prohibit MPC from 
rendering analytical solutions. Two types of constraints are usually imposed. One is 
referred to as target abundance-constrained MPC (TACMPC), which imposes constraints 
on target abundance fractions to satisfy desired properties. In this case, the unconstrained 
LSMA considered in PART III can be extended to constrained LSMA with the 
abundance sum-to-one constraint (ASC) and abundance non-negativity constraint (ANC) 
imposed on the linear mixture model used in LSMA. Since no closed-form solutions are 
generally available, we must rely on numerical algorithms to generate optimal solutions. 
To meet this need. Chapter 10 presents a least-squares based efficient algorithm, called 
fully constrained least-squares (FCLS) algorithm. Because the proposed FCLS algorithm 
is designed for solving fully constrained linear mixing problem, it can be further used for 
material quantification. One major drawback of LSMA is requirement of complete target 
knowledge, which must be given a priori. This also includes image background 
signatures. Unfortunately, obtaining such complete knowledge, specifically, image 
background is very difficult if not impossible. In order to cope with this issue, another 
type of constrained MPC, referred to as target signature-constrained MPC (TSCMPC) is 
considered in Chapter 1 1 where the LCMV detector considered in Chapter 4 is extended 
to an LCMV classifier. Rather than constraining target abundance fractions as does 
TACMPC, TSCMPC constrains the directions of target signature vectors. In this case, 
only target signature vector directions are of interest. The LCMV classifier constrains a 
set of multiple target signature vectors so that only targets with desired constrained 
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directions will pass through the filter while the output energy caused by signature vectors 
from other directions will be minimized. This approach is based on the fact that two 
hyperspectral image pixel vectors, which have similar spectra or point to the same 
direction will be considered to be in the same target class. Chapter 12 develops an 
another new TSCMPC approach, linearly constrained discriminant analysis (LCDA) 
which constrains the directions of target signature vectors in a linear mixture model to 
being aligned with predetermined coordinates so that the targets of interest can he 
classified and separated in a way that we desire. Surprisingly, the derived LCDA 
classifier turns out to be a constrained version of Harsanyi-Chang's OSP classifier in 
Chapter 8. 

1.2. 3. 3 Automatic Mixed Pixel Classification (AMPC) 

The automatic mixed pixel classification (AMPC) considered in this part includes 
unsupervised MPC and automatic target classification. Like automatic subpixel 
detection, the former requires a posteriori target knowledge that can be obtained directly 
from the image data by an unsupervised means, while the latter does not need any a 
priori or a posteriori target information. The unsupervised MPC can be viewed as an 
unsupervised extension of supervised MPC considered in PART III and PART IV. 
Chapter 13 presents two versions of unsupervised MPC, referred to as desired target 
detection and classification algorithm (DTDCA) which is developed to search specific 
targets, and automatic target detection and classification algorithm (ATDCA) which is 
designed for the purpose of surveillance and monitoring. Two types of automatic target 
classification are of particular interest. One type is anomaly classification in Chapter 14, 
which extends the ability of anomaly detection considered in Chapter 5 to anomaly 
classification. Another type discussed in Chapters 15-16 performs target detection and 
classification simultaneously with no need of any a priori or a posteriori target 
information. Chapter 15 borrows the concept of the well-known blind source separation 
technique, independent component analysis (ICA) to establish a linear spectral random 
mixture analysis (LSRMA) which models potential target sources as unknown random 
signal sources. Chapter 16 extends the concept of principal components analysis (PCA) 
and ICA to projection pursuit (PP) with a project index designed to capture interesting 
structures of image data. When the projection index is the data variance, PP turns out to 
be PCA. When the project index measures the statistical independency among all 
projection images, PP becomes the ICA. 

One of most challenging problems encountered in unsupervised target detection is 
how many targets are assumed to be in image data. This issue is addressed in Chapter 
17. Instead ojf using the traditional terminology, intrinsic dimensionality (ID), we 
introduce a new definition, virtual dimensionality (VD), which we think it more 
appropriately reflects the nature of the problem. It is defined as the number of spectrally 
distinct signal sources present in image data, which may include image endmembers, 
unidentified signal sources, unknown interferers. Three Neyman-Pearson detection theory- 
based eigen-thresholding methods are proposed in Chapter 17 to estimate the VD. 

1.2.4 Hyperspectral Data to Be Used in the Book 

Three data sets will be used for experiments throughout this book. Each of these data 
sets has a particular purpose of usage. The first data set was presented in (Harsanyi and 
Chang, 1994) and is AVIRIS reflectance data shown in Fig. 1.5, which will be mainly 
used for computer simulations. There are five field reflectance spectra, blackbrush. 
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creosote leaves, dry grass, red soil and sagebrush with spectral coverage from 0.4 |Lim to 
2.5 jim and 158 bands after the water bands are removed. The second and third data sets 
to be used for experiments in this book are AVIRIS and HYDICE data. The AVIRIS 
image shown in Fig. 1.6(a) is the same image data considered in Harsanyi and Chang 
(1994) and Chang et al. (1998b) which has lOnm spectral resolution and 20m spatial 
resolution. There are five target signatures of interest, cinders, rhyolite, playa (dry lake), 
vegetation and shade. It is a subscene of 200 x 200 pixels extracted from the upper left 
comer of the Lunar Crater Volcanic Field in Northern Nye County, Nevada delineated by 
white lines as shown in Figure 1.6(b). 




Figure 1.5. Spectra of five AVIRIS reflectances 




(a) LCVF Siibscene (20 Ok 2OO) (b) AVTRIS LCVF scene* {■512x614) 

Figure 1,6. AVIRIS LCVP image 



The HYDICE image shown in Fig. 1.7(a) has size of size 64 x 64 with 15 panels in 
the scene. Within the scene there has a large grass field background, a forest on the left 
edge and a barely visible road mnning one the right edge of the scene. Low signal/high 
noise bands; bands 1-3 and bands 202-210; and water vapor absorption bands: bands 
101-112 and bands 137-153 were removed. The spatial resolution is 1.5m and spectral 
resolution is lOnm. There are 15 panels located in the center of the grass field and are 
arranged in a 5x3 matrix as shown in Fig. 1.7(b) which provides the ground truth map 
of Fig. 1.7(a). Each element in this matrix is a square panel and denoted by p. with row 
indexed by z = l,--*,5 and column indexed by j = 1,2,3. For each row z = !,•••, 5, the 
three panels p.^, p,^, p .3 were painted by the same material but have three different sizes. 

= 1,2,3, the five panels Pj., p^^., p^., p^^., p^^ have the same size but were painted by 
five different materials. It should be noted that the panels in rows 2 and 3 are made by 
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the same material with different paints, so did the panels in rows 4 and 5. Nevertheless, 
they were still considered as different materials. The sizes of the panels in the first, 
second and third columns are 3mx3m, 2mx2m and Im x Im respectively. So, the 
15 panels have five different materials and three different sizes. Fig. 1.7(b) shows the 
precise spatial locations of these 1 5 panels where red pixels (R pixels) are the panel center 
pixels and the pixels in yellow (Y pixels) are panel pixels mixed with background. The 
1.5m-spatial resolution of the image scene suggests that most of the 15 panels are one 
pixel in size except that p,j, Pj^, P 4 ,,P 5 j which are two-pixel panels. Since the size of the 

panels in the third column is Im x Im, they cannot be seen visually from Fig. 1.7(a) due 
to the fact that its size is less than the 1.5m pixel resolution. 




Fig. 1.8 plots the five panel spectral signatures obtained from Fig. 1.7(b), where the 
i-th panel signature, denoted by Pi was generated by averaging the red panel center pixels 
in row i. These panel signatures will be used to represent target knowledge of the panels 
in each row. 




Figure 1.8. Spectra of PI, P2, P3, P4 and P5 



1.2.5 Notations to Be Used in the Book 

Since this book primarily deals with real hyperspectral data, the image pixels are 
generally mixed and not necessarily pure, the term “endmember” is not used here; 
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instead, a general term “signature” or “signature vector” is used. In addition, since we are 
only interested in target detection and classification, the term “targets” instead of 
“materials” is also used throughout the book. In order to make a distinction between a 
target pixel and its spectral signature vector, we use notation t to represent a target pixel 
or signature, r for an image pixel vector and m for a spectral signature vector. We also 
use bold upper case for matrices and bold lower case for vectors. The italic upper case L 
will be used for the total number of spectral bands, for the sample spectral 

covariance matrix and for the sample spectral correlation matrix. Also, <5, (r) is 
used to represent a detector or classifier that operates on an image pixel vector r where the 
subscript * in 5^ (r) specifies what type of a detector or classifier to be used. It should 
be noted that <5, (r) is a real-valued function that takes a form of inner product of a filter 

vector w* with r, that is, j r with the filter vector w* specified by a 

particular detector or classifier. We also use “a” and d to represent the abundance vector 
and its estimate where the notation “hat” over “a” indicates “estimate”. Finally, all the 
acronyms of terminology used in this book are provided in the glossary at the end of the 
book. 
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Spectral angle mapper (SAM) has been widely used as a spectral similarity measure 
in remote sensing. It measures the angle between two spectral signature vectors, which 
provides a useful feature for material identification. Euclidean distance (ED) is a distance 
measure commonly used in spatial-based image processing. It is generally used to 
measure the closeness between two data samples in terms of distance. When it is applied 
to spectral pixel vectors, it can be also used to measure the similarity between two 
spectral pixel vectors. Despite their difference in form, SAM and ED turn out to be a 
same measure as the angle of two spectral pixels becomes small. A common drawback of 
SAM and ED is that they are deterministic measures in the sense that they do not 
provide stochastic description about pixel vectors. As a hyperspectral sensor significantly 
improves the spectral resolution, its sensitivity to noise and level of information is also 
increased. Such sensitivity may be better characterized by uncertainty. Using a 
deterministic measure, such as SAM and ED may not be an effective way to describe 
random behavior of spectral variability. PART I consists of a single chapter, Chapter 2 
that addresses this issue by considering a hyperspectral pixel vector as a random variable 
with its spectrum as a probability distribution. As a result of this interpretation, a 
spectral information measure (SIM) can be defined by the concept of self-information 
borrowed from information theory. With the use of the self-information two information 
theoretic measures, called spectral information divergence (SID) and hidden Markov 
model-based information divergence (HMMID) can be derived to measure similarity 
between hyperspectral image pixel vectors. In the past, many spectral similarity measures 
developed in pattern classification have been also used for hyperspectral measures. In 
order to compare one against another, a new definition of relative spectral discriminatory 
power (RSDPW) is also suggested in Chapter 2. On some occasions we are also 
interested in identifying a pixel vector of interest using an existing database or a spectral 
library. For this purpose, a new criterion, called relative spectral discriminatory 
probability (RSDPB) is also proposed. Finally, an uncertainty measure, called relative 
spectral discriminatory entropy (RSDE) can be also derived from RSDPB, which can be 
used to measure the uncertainty of a similarity measure in material identification using a 
spectral library. 
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CHARACTERIZATION 



A hyperspectral image can be considered as an image cube where the third dimension 
is represented by hundreds of contiguous spectral bands. As a result, a hyperspectral pixel 
is actually a column vector with dimensions equal to the number of spectral bands. Such 
between-band spectral information is very useful and can be used for spectral 
characterization. Many measures proposed in signal processing and pattern recognition 
can be used for this purpose. Nevertheless, most of them are spatial-based measures, and 
they are not particularly designed to measure spectral properties inherent in a single pixel 
vector. In this chapter, two new hyperspectral measures, spectral information measure 
(SIM) (Chang, 2000) and hidden Markov model (HMM)-based spectral measure (Du and 
Chang, 2001) are presented, both of which are derived from the Kullback-Leibler 
information distance to capture the spectral variability of a pixel vector. Additionally, 
spectral information divergence (SID), relative spectral discriminatory probability, 
relative spectral discriminatory power and relative spectral discriminatory entropy are also 
introduced to further account for spectral similarity and discrminability among pixel 
vectors. 



2.1 MEASURES OF SPERCTRAL VARIABILITY 

A hyperspectral image is generally acquired by hundreds of spectral channels. As a 
result, a scene pixel vector is usually represented by a column vector, in which each 
component contains specific spectral information provided by a particular channel. 
Therefore, a greater number of spectral channels translate more spectral information. This 
implies that a hyperspectral image pixel vector generally contains more spectral 
information than does a multispectral image pixel vector. In many situations, such 
spectral information is valuable and crucial in data analysis. In order to capture and 
characterize the spectral properties vector provided in a single pixel vector by hundreds 
bands, two statistical spectral measures are introduced in this section. 
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2.1.1 Spectral Information Measure (SIM) 

The spectral information measure (SIM) is an information theoretic measure that 
models the spectral band-to-band variability as uncertainty resulting from randomness. It 
considers each pixel vector as a random variable with the probability distribution 
obtained by normalizing its spectral histogram to unity. With this interpretation, SIM 
can measure the spectral variability of a single hyperspectral pixel vector resulting from 
band-to-band correlation. It not only can describe the randomness of a pixel vector, but 
also can generate high-order statistics of each pixel vector based on its spectral histogram. 
So, SIM can be considered as a single pixel vector-based stochastic measure. It is 
particularly useful for hyperspectral image analysis. This is because the spectral 
information provided by each hyperspectral image pixel vector can improve material 
detection, discrimination, classification and identification. However, this advantage is 
also traded for extraction of many unknown interferers, which can be identified a priori 
(Chang et ah, 1998). The effects caused by such interference can be only described by 
randomness but cannot be characterized deterministically. SIM is designed to meet this 
need. It can capture the uncertainty created by unknown signal sources in a stochastic 
maimer. Therefore, the higher the data dimensionality, the more the randomness, which 
enhances the relative effectiveness of SIM. Because SIM is a statistical measure, it can 
generate higher order statistics that can further be used to characterize spectral variability 
such as variance (second order statistic used to measure for standard deviation), skewness 
(third order statistic used to measure for symmetry), kurtosis (fourth order statistic used 
to measure flatness). This advantage cannot be achieved by any deterministic measure 
such as Euclidean distance or spectral angle mapper (SAM) (Schowengerdt, 1997). 

For a given hyperspectral pixel vector x = (Xj,---,x^)^, each component x, 
represents a pixel in band image which is acquired by a certain wavelength co^ in a 

specific spectral range. Let s ~ be the corresponding spectral signature (i.e., 

spectrum) of x where represents its spectral signature of x^ in the form of either 

radiance or reflectance values. Suppose that is a set of L wavelengths, each of 

which corresponds to a spectral band channel. Then x can be modeled as a random 
variable by defining an appropriate probability space (f2,S,P) associated with it where Q 
is a sample space, S is an event space and F is a probability measure. In this case, we let 
Q = be the sample space and S be the power set of i.e., the set 

of all subsets of Q and x(co^) = .y^. In order to define a legitimate probability measure P 
for X, we first assume that all components s/s associated with x are nonnegative. This is 
generally a valid assumption due to the nature of radiance or reflectance. With this 
assumption, we can normalize x/s to the range of [0, 1] as follows, 

( 2 . 1 ) 

Using (2,1) we define a probability measure P for x by 



P{{(0.]} = p,. 



( 2 . 2 ) 
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The probability vector p = is the desired probability distribution of 

pixel vector x. By means of this probability interpretation, any pixel vector 
X = can be viewed as a single information source with its statistics governed 

P - iPi^P2^"'^PLy defined by (2.1) and (2.2). As a result, these statistics can be 
used to describe the spectral variability of a pixel vector. For instance, we can define its 
statistics of different orders, such as mean /t(x) = , variance 

O'^(x) = - /^(x)^ , third central moment kt 3 (x) = - A^(x)) , fourth 

central moment, k:^(x) = -/^(x)^ , etc. Like Taylor's series which is widely 

used to approximate a deterministic function, we can also use the moment generation 
equation to fully describe the probabilistic behavior of the spectral signature of each 
hyperspectral image pixel vector where all moments can be obtained via (2.1) and (2.2). 

Since a hyperspectral image pixel vector x can be considered as an information 
source described by (2.1) and (2.2), from information theory (Fano, 1960) we can further 
define its self-information provided by band I by 

7^(x) = - log/?,. (2.3) 

Using (2.3) the entropy of each hyperspectral image pixel vector x, F7(x) can be obtained 
as 



H(x) = -S,':,p,logp, (2.4) 

which can be used to describe the uncertainty resulting from the pixel vector x. 

2.1.2 Hidden Markov Model (HMM)-Based Measure 

Another statistical measure makes use of a hidden Markov model (HMM) to capture 
the unobserved and hidden spectral properties of a hyperspectral image pixel vector. 
HMM has been widely used in speech recognition (Rabiner and Juang, 1993) to model a 
speech signal as a doubly stochastic process with a hidden state process that can be only 
observed through a sequence of observations. Since the temporal variability of a speech 
signal is similar to the spectral variability of a hyperspectral image pixel vector, a similar 
idea can be applied to a hyperspectral spectral vector. In this case, a hidden Markov 
process is used to characterize spectral correlation as well as band-to-band variability with 
model parameters determined by the spectrum of the pixel vector that forms an 
observation sequence. In speech processing, the same word spoken in different times 
generally results in different speech spectra. This is also true for the case that the 
spectrum of the same material taken at different times varies. 

In analogy with SID, we introduce a new HMM-based spectral measure, referred to 
as HMM information divergence (HMMID) that can be also derived from self- 
information specified by (2.3) and (2.4) (Du and Chang, 2001). 

Let o = be an observation process with being the observation 

taken place at time t and T is the number of observations made in the process. Assume 
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that there are N states denoted by {1,2, and the state at time t is denoted by q^. 

Let A = i a. }• be the state transition matrix with a., given by 

= 0 (2,5) 

and B = & be the observation probability density matrix where b.{o,) is 

L ^ ^ h<j<N,\<t<T ^ ^ 

the probability density of the observation at time t in they-th state. We further assume 
that 71 = is the initial state distribution with tl. given by 

7i.=P{q,=j). (2.6) 

So, an HMM can be uniquely defined by a parameter triplet, denoted by A = ( A, B, n) 
which can be estimated by the Baum-Welch algorithm using the maximum likelihood 
estimation as follows (Rabiner and Juang, 1993). 

Assume that the probability density is a Gaussian mixture. It has been shown 

in Rabiner and Juang (1986) that it is equivalent to a multistage single Gaussian density 
given by 



bj(o,) = - 



j27raj 



exp 






2(j: 



(2.7) 



where ju. and cr^ are the mean and variance of the observation o, in the /-th state 

j j t j 

respectively. 

Now, we define the forward probability a. it) as the joint probability of observing 
the first t observations , • • • , in the y-th state at time t. It can be solved by 



a.{\) = n^b-{o;) 

a^(t) = [YLa,{t - \)a,]b.{o,) for 1< ; < r. 



( 2 . 8 ) 

(2.9) 



Similarly, we define the backward probability Pj (t) as the conditional probability of 
observing the observations the state at time t is y. It can be 

solved by 



II 

b 


(2.10) 


y3,(0 = Z,'l,a -,i>,(o„,)A(« + 1) for 1< r < 7. 


(2.11) 
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Let Y denote the conditional probability of h.{o^) given that the observation 



0 = denote the conditional probability of a transition from state 

1 to state j at time t+\ given the observation o = Then they can be 

calculated as follows 




CCj{t)Pj(t) 


(2.12) 


= 


Er=iSr,iat(0atA(o,^,))3,(f + 1) 


(2.13) 


Using (8) and (9) the 


a., and n. can be estimated by the following equations 


fij(t) = 




(2.14) 




Sf=i[y;(o(o, 1 

sr=ir;(0 


(2.15) 




sr=r,%(0 


(2.16) 


= r/i) • 


(2.17) 



As indicated previously, an endmember may be represented by variants of its true 
spectral signature from pixel vector to pixel vector because of unpredicted mixing 
occurring in a pixel vector. However, there must have some unobserved properties 
governed by this particular endmember that can distinguish itself from other 
endmembers. This spectral characterization is very similar to speech signals where one of 
their key features is pitches. If we assume the observation sequence o = is 

represented by the spectral signature s of a pixel vector x in a hyperspectral image, we can 
use the HMM to capture the unobserved and hidden spectral properties of s. Let be 
the parameter vector used to specify s and HMM(AJ be the HMM determined by . In 
analogy with (2.3) we can define the self-information of s provided by HMM(AJ, 
denoted by /hmm(a.)(s) as follows 



(2.18) 
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2.2 SPECTRAL SIMILARITY MEASURES 

In the previous section, two new hyperspectral measures were suggested to capture 
spectral variations of a single pixel vector. In this section, we consider the issue of how 
to measure the similarity between two pixel vectors. In the past, many pure pixel vector- 
based similarity measures have been used to evaluate the similarity between two pixel 
vectors. However, they may not be effective for hyperspectral pixel vectors since they do 
not take full advantage of band-to-band spectral correlation. In order to make comparison, 
we describe several commonly used measures which can be used for spectral similarity. 

First, we assume that there are two pixel vectors r. = (r and r^. = 

with their respective spectral signatures given by s. = and s^. = . 

2.2.1 Commonly Used Measures 

Three different types of measures are of interest in pattern classification and can be 
briefly described as follows. 

2. 2. 1.1 Distance-Based Measures 

In multivariate analysis, communications and signal processing, distance-based 
measures are most commonly used metrics to measure the distance between sample data 
points. Three metrics that calculate the distance between the spectral signatures of two 
pixel vectors, s. and s. can be derived from -norms in real analysis. 



• City block distance (CBD) corresponding to /j-norm 

CBD(s,,sp = i;,t,|i,-i,,| . (2.19) 

« Euclidean distance (ED) corresponding to -norm 

ED(s, , sp = ||s, - s^. I s [S,t, (i„ - )T ' ' • (2.20) 

• Tchebyshev distance (TD) or maximum distance corresponding to /^-norm 

TD(s,,s^) - i,,|} . (2.21) 

2. 2. 1.2 Orthogonal Projection-Based Measures 

Two measures can be derived from orthogonal projection, SAM and orthogonal 
projection divergence (OPD) (Ren and Chang, 1998). 

SAM is a widely used spectral similarity metric in remote sensing. It measures 
spectral similarity by finding the angle between the spectral signatures of two pixel 
vectors, s. and s .. 

' j 
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SAM(s.,s^.) = cos“'^s. • s. / ||sj|||sjj 

= cos-(sf=. V. / 

The concept of the OPD is originated from the orthogonal subspace projection (OSP) 
(Harsanyi and Chang, 1994). It finds the residuals of orthogonal projections resulting 
from two pixel vectors, s. and s. given by 



C \ 1/2 

s^P^s. + 



(2.23) 



where ~ ^ ~ ^lxl L x L identity matrix. 

It is worth noting that if both s. and s^. are normalized to unity, the relationship 
between ED(s.,sp and SAM(s.,sp can be established as follows. 

ED(s.,s.) = -^2-2(8. ,sA = J2(l-cos(sAM(s,,sp)) 

^ : ' (2.24) 

= 2-Jl - cos (sAM(s„sp)/2 = 2 sin (sAM(s,,sp/2) 

where (s,,s^) = S,":, Vj, • 

When SAM(s,,sp is small, 2sin(sAM(s,,Sj) / 2 ) = SAM(s,,s.) , in which case 
SAM(s.,sp is nearly the same as ED(s.,s^.). Fig. 2.1 shows the geometric 
interpretations of ED and SAM, and their relationship. 




Figure 2.1. Geometric relationship between ED and SAM 

2.2.2 Spectral Information Divergence (SID) 

A new criterion to measure spectral similarity, called spectral information divergence 
(SID) is presented in this section (Chang, 2000). It originates from the concept of 
divergence in information theory and measures the discrepancy of probabilistic behaviors 
between the spectral signatures of two pixel vectors. In other words, the spectral 
similarity between two pixel vectors is measured by SID based on the discrepancy 
between their corresponding spectral signature-derived probability distributions. The idea 
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of using divergence is not new and has been also found in pattern recognition (Tou and 
Gonzalez, 1974) and band selection (Mausel et ah, 1990, Conese and Maselli, 1993, 
Steams et al. 1993, Jensen, 1996) such as Jeffries-Matusita measure. But what is new for 
SID is that it is designed from SIM for spectral similarity. Comparing to SAM and ED 
that extract geometric features, i.e. angle and spatial distance between two pixel vectors, 
SID measures the distance between the probability distributions produced by the spectral 
signatures of two pixel vectors. Accordingly, SID may be more effective than ED and 
SAM in capturing spectral variability. 

Now, following (2.1) we calculate the probability vectors p = and 

q = for the spectral signatures of two pixel vectors, s, and where 

Pf, = s.^ / and < 7 ^ = . So, the self-information provided by r^. for band 

I is defined by (2.3) and given by 

7^(rp = -log^,. (2.25) 

Using (2.3) and (2.25) we can further define D,(rJ|rp (Cover and Thomas, 1993), the 
discrepancy in the self- information of band / in r . relative to the self-information of band 
I in r. by 



A (r, 1 1 «•; ) = Ar ) - /, (r, ) = (- log g, ) - (- log p, ) 

(2.26) 

= \og[p,!q) 

Averaging D^(rJ|rp in (2.16) over all bands 1 < / < L with respect to r. results in 
I>(rj|r,) = E;-.,P,A(r,||rp = f/,(rp - /,(r,)) 

(2.27) 

= Sf.,P,log(p, /g,) 

where D(rJ|r^ ) is the average discrepancy in the self-information of r^. relative to the 
self-information of r,. In context of information theory D(rJ|rp in (2.27) is called the 
relative entropy of r^. with respect to r. which is also known as Kullback-Leibler 
information measure, directed divergence or cross entropy (Kullback, 1968). Similarly, 
we can also define the average discrepancy in the self-information of r. relative to the 

self-information of r . by 

D(rjlr) = (/,(r.) - 7,(rp) 

. (2.28) 

= Sf., 9, log(?, / p) 

Summing (2.27) and (2.28) yields spectral information divergence (SID) defined by 
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SID(r.,r .) = D(r.|lrp + D(rjlr), (2.29) 

which can be used to measure the spectral similarity between two pixel vectors r. and 
r^.. It should be noted that while SID(r.,rp is symmetric, D(r.||r.) is not. This is 
because SID(r,,r^) = SID(r^.,r.), D(rJ|rp Z9(rJ|r.). Compared to other popular 

similarity measures, SID offers a new look of spectral similarity by taking advantage of 
relative entropy to account for the spectral information provided by each pixel vector. 

A measure similar to SID, called Jeffries-Matusita measure that has been used for 
band selection (Jensen, 1 996) can be also defined as Jeffries-Matusita distance (JMD) by 

= . (2.30) 

2.2.3 Hidden Markov Model-Based Information Divergence (HMMID) 

Following an approach similar to SID, we can also define an HMM-based 
information divergence for two pixel vectors. Suppose that a hyperspectral pixel vector 
specified by HMM(A^ ) can be also modeled by another HMM(X). In this case, the 

information discrepancy of provided by HMM(A^ ) and HMM(X) is the entropy of 
HMM(>i) relative to HMM(A^ ) and is given by 

‘ (2.31) 

= l/r[logF(sJA)~logP(sJA^ )J 

Using (2.31) we can further define an HMM-based information distance between two 
hyperspectral pixel vectors r. = C'*,! 

and s^. = be their respective spectral signatures specified by their associated 

hidden Markov models, HMM(A^ ) and HMM(A^ ). An HMM information divergence 
(HMMID) between r . and is defined by Du and Chang (2001) 

HMMID(r.;rp = ;A ). (2.32) 



2.3 MEASURES OF SPECTRAL DISCRIMINABILITY 

In Section 2.2, spectral similarity measures between two pixel vectors were 
developed. When there are more than two pixel vectors, how do we discriminate one 
from another? 

In this section, we study three measures that can be used for spectral discriminability 
among a set of pixel vectors (Chang, 2000). 
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2.3.1 Relative Spectral Discriminatory ProBability (RSDPB) 

In many remote sensing applications it is often the case that we are required to 
identify a pixel vector of interest using an existing spectral library or data base. Under 
this circumstance, we are interested in how much likelihood of a particular spectral 
signature t will be identified by a selective set of spectral signatures, A, which serves as a 
spectral database or spectral library. To meet this need, a new criterion, called relative 
spectral discriminatory probability (RSDPB) is proposed. It calculates the discriminatory 
probabilities of all the spectral signatures in a spectral library or database relative to the 
spectral signature of a pixel vector to be identified. The resulting probabilities show the 
likelihood of the pixel vector to be identified by the materials in the library or database. 
Although SAM has been also widely used for this purpose, it is shown in experiments 
that SID seems to have advantages over SAM in characterizing spectral similarity and 
variability. As explained previously, this may be due to the fact that SAM only preserves 
the angle between two pixel vectors rather than their random behaviors in nature. Since 
remotely sensed image pixel vectors are generally corrupted by noise and other unknown 
factors during acquisition, using information theoretic-based measures may be a better 
alternative to traditional measures, which allow each pixel vector to have a certain degree 
of spectral variations. 

Let be K spectral signatures in the set A which can be considered as a 

database and t be any specific target spectral signature to be identified using A. We define 
the spectral discriminatory probabilities of all s/s in A relative to t as follows. 

^{k) = m{Us^) ! fox (2.33) 

where S}=i^(t,s^) is a normalization constant determined by t and A. The resulting 

probability vector ^ = (p^ ^(I),Pj ^(2),---,p^ ^(if))^ is called relative spectral 
discriminatory probability (RSDPB) of A with respect to t or spectral discriminatory 
probability vector of A relative to t. Then, using (2.33) we can identify t via A by 
selecting the one with the smallest relative spectral discriminability probability. If there 
is a tie, either one can be used to identify t. Fig. 2.2 shows a graphical representation of 
RSDPB of A with respect to t. 

A,a(*) 




Figure 2.2. Graphical representation of RSDPB, ^ of A with respect to t 

2.3.2 Relative Spectral Discriminatory PoWer (RSDPW) 

In the previous section, we focused on the similarity between the spectral signatures 
of two pixel vectors. However, if we are given two spectral similarity measures, how do 
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we evaluate which one is more effective than the other? In order to cope with this issue, a 
new criterion, called relative spectral discriminatory power (RSDPW) is suggested in this 
section. It is designed based on the power of discriminating one pixel vector from 
another relative to a reference pixel vector. 

Assume that is any given hyperspectral measure. Let d be the spectral 

signature of a reference pixel vector and s. , s^, be the spectral signatures of any pair of 

two pixel vectors. The RSDPW of denoted by RSDPW^(s.,s^.;d) is defined by 

RSDPW^^(s.,s^.;d) = max|m(s.,d)/ m(s^.,d),m(s^.,d) / m(s.,d)j. (2.34) 

More precisely, RSDPW ^^^(s.,s.;d) selects as the discriminatory power of the 

maximum of two ratios, ratio of m(s.,d) to m(s^.,d) and ratio of m(s^.,d) to m{s.,d). 
The RSDPW^^^(s.,s.;d) defined by (2.34) provides a quantitative index of spectral 
discrimination capability of a specific hyperspectral measure m(-,-) between two spectral 
signatures s., s^. relative to d. Obviously, the higher the RSDPW^^_(s. ,s^. ;d) is, the 
better discriminatory power the m(-,*) is. In addition, RSDPW^^^(s_,s^ ;d) is symmetric 
and bounded below by one, i.e., RSDPW^_(s.,s^.;d) > 1 with equality if and only if 
s. = s.. 

> j 

2.3.3 Relative Spectral Discriminatory Entropy (RSDE) 

Since given by (2.33) is the relative spectral 

discriminability probability vector of t using a selective set of spectral signatures, 
A = {Sj. we can further define the relative spectral discriminatory entropy (RSDE) of 
the spectral signature t with respect to the set A, denoted by by 

(2.35) 

Equation (2.35) provides an uncertainty measure of identifying t resulting from using 
A = {s A higher may have a less chance to identify t. For example, if 

p^^ is uniformly distributed, i.e., p^^(k) = l/ K for k = , then 

achieves its maximum, log AT, namely, < logK. This is the worst scenario 

of material identification. In other words, when the probability of using any in A to 
identify t is equally likely, there is no way for us to know which signature is more likely 
to be t. In this case, any signature can be used to identify t. However, if is 

small, it only provides information that some candidates in A are more likely to be used 
to identify t, but it does not necessarily mean that their relative discriminability rates are 
high. As an illustrative example, we consider a hypothetical case that the relative spectral 
discriminability probabilities for two signatures u, v in A are very small with equal 
probability 2~\ while others are all zeros except for one signature G A with 
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p^ ^(jt) = 14/16. The RSDE = 1 / 2 + 14 / 16(4 - log^ I 4 ) bits, which is 

0.6686, bits less than one bit. So, the chance to identify t must be good. However, this 
is only partly true in the sense that we may conclude that t must be either u or v but 
there is no way for us to know which one would be since their relative spectral 
discriminability probabilities are equal to 2~\ Under this circumstance, one of them can 
be used to identify t. 



2.4 EXPERIMENTS 

Two sets of hyperspectral image data, AVIRIS and HYDICE will be used for 
experiments. Since the performance of CBD, TD, OPD and JMD is very similar to that 
of ED and SAM according to experiments conducted based on these two data sets, only 
four representative spectral measures, ED, SAM, SID and HMMID, one from each 
category described in Section 2.2 are selected for performance evaluation. 

2.4.1 AVIRIS Data 

The data set to be used in this section is Airborne Visible/Inffared Imaging 
Spectrometer (AVIRIS) reflectance data shown in Fig. 1.5. For spectral variability, 
Tables 2.1 tabulates the statistics generated by SIM for these five signatures up to four 
moments (not central moments) using (2.1) and (2.2) where fjL. is the z-th moment, and 
Fig. 2. 3 plots values in Table 2.1 for graphical interpretation. 



Table 2.1. 













blackbrush 


0.2670 


0.0786 


0.0245 


0.0079 


creosote leaves 


0.4091 


0.1941 




0.0512 


drygrass 


0.5310 




0.1628 


0.0929 


redsoil 


0.4202 


0.1828 


0.0812 


0.0366 


sagebrush 


0.3979 


0.1774 




0.0419 




Figure 2»3. Pbts of l able 2. 1 

Table 2.2 also tabulates the self-information of these five signatures using SID 
specified by (2.3) and HMM specified by (2.18) respectively and Figure 2.4 is their 
graphical plots. 
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Table 2.2. Sd^i^ormatjonoTthefr^^ 





blackbrush 


creosote leaves 


dry grass 


redsoil 


sagebrush 


SID 


7.1814 


7.0564 


7.2740 


7.2356 


7.1709 


HMM 


3.5639 


3.3771 


3.9115 


3.6489 


3.5959 




Fjg«rr 2.4» Flt)Ui of’] able 2.2 



As we can see in Tables 2.1 and 2.2, the HMM-based measure offers more valuable 
information than SID does. For example, SID produces very close values of self- 
information for all the five signatures. On the contrary, HMM categorizes the values of 
self- information of these five signatures into three classes, one for three signatures: 
blackbrush, creosote leaves and sagebrush, one for red soil and another for dry grass. 
From Fig. 1.5 we can see that blackbrush (open circle) creosote leaves (asterisked line) 
and sagebrush (diamonded line) have very similar spectra while red soil and dray grass 
are very distinct. This fact is also reflected in Table 2.1 where all their four moments 
generated by SIM are very close. However, it is very difficult to determine which 
signature is more close to another by visual inspection from the figure. 








blackbrush 


creosote leaves 


drygrass 


redsoil 


sagebrush 


blackbrush 


0 


0.1765 


0.2568 


0.4031 


0.0681 


Creosote leaves 




0 


0.4182 


0.5637 


0.1288 


dry grass 






0 


0.2175 


0.2957 


redsoil 








0 


0.4477 


sagebrush 










0 


Table 2.4. 


Similarity values produced by SAM among the five signatures in Fig. 1.5 




blackbrush 


Creosote leaves 


drygrass 


redsoil 


sagebrush 


blackbrush 


0 


0.1767 


0.2575 


0.4058 


0.0681 


Creosote leaves 




0 


0.4213 


0.5714 


0.1289 


dry grass 






0 


0.2179 


0.2968 


redsoil 








0 


0.4515 


sagebrussh 










0 


Table 2.5. 


Similarity values 


produced by SID among the five signatures in 


Fig. 1-5 




blackbrush 


creosote leaves 


drygrass 


redsoil 


sagebrush 


blackbrush 


0 


0.0497 


0.0766 


0.1861 


0.0063 


creosote leaves 




0 


0.2298 


0.4154 


0.0303 


drygrass 






0 


0.0640 


0.0973 


redsoil 








0 


0.2340 


sagebrush 










0 
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TTabl£^2^6^_^milari^valueSi^roduced^b^ 



blackbrush 


creosote leaves 


drygrass 


redsoil 


sagebrush 


blackbrush 


0 


1.5390 


3.6717 


3.9182 


0.2263 


creosote leaves 




0 


6.2549 


6.7333 


1.4102 


drygrass 






0 


1.1302 


4.0073 


redsoil 








0 


3.4166 


sagebrush 










0 



Tables 2.S-2.6 are produced for spectral similarity among these five signatures by 
ED, SAM, SID and HMMID using (2.20), (2.22), (2.29) and (2.32) respectively. 
According to Tables 2. 3-2. 6, the smaller value between two signatures is, the more 
similar the two signatures are. The results produced by ED and SAM were nearly the 
same because of (2.24). Moreover, as we can see, blackbrush is closest to sagebrush, 
while creosote leaves is closest to sagebrush. If we examine the last column under 
sagebrush in Tables 2. 3-2.6, it found that sagebrush is more closer to creosote leaves 
than to blackbrush. The similarity values produced by ED and SAM between blackbrush 
and sagebrush was about twice between creosote leaves. The similarity values produced 
by SID between blackbrush and sagebrush was about five times between creosote leaves. 
The similarity values produced by HMMID between blackbrush and sagebrush was about 
six times between creosote leaves. For signatures whose spectra are dissimilar, HMMID 
produced even much greater values than do other three measures. For example, the 
similarity values between red soil and blackbrush, between red soil and creosote leaves, 
between red soil and sagebrush were 3.9182, 6.7333 and 3.4166 respectively which are 
much greater values compared to their counterparts produced by ED, SAM and SID. An 
assessment based on Tables 2. 3-2.6 may be subjective and difficult to compare the 
discriminatory power of the four measures. In order to better illustrate the results in 
Tables 2. 3-2. 6, we also plot their values in Fig. 2.5 where B,C,D,R,S represent 
blackbrush, creosote leaves, dray grass, red soil and sagebrush respectively where the 
plots produced by ED and SAM were nearly identical. 




Figure 2.S. Plotn «,rTab|es 2, 3-2.6 



However, it should be noted that the magnitude of the similarity values in Fig. 2.5 
does not imply the discriminatory power of each measure. In this case, we calculate 
RSDPW for ED, SAM, SID and HMMID to evaluate their spectral discriminatory power 
relative to a reference signature d, which is chosen to be the red soil. As we know, the 
signature of dry grass is very close to that of red soil. So, the power of HMMID 
discriminating s. = dry grass from the other three signatures s^. = blackbrush, creosote 
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leaves, sagebrush, were given by 3.4665, 5.9571, 3.0227 respectively. The values of 
RSDPW produced by HMMID are much greater than those produced by ED, SAM and 
SID. This implies that the signature of dry grass is very different from those of 
blackbrush, creosote leaves and sagebrush. 

Comparing RSDPWj^j^(s. = dry grass, s^. = blackbrush; d = red soil) ~ 63 to 

RSDPW^j^p(s. = dry grass, s^. = sagebrush ;d = red soil) ~ 69 suggests that both 

blackbrush and sagebrush must have very similar signatures, which is indeed the case. 
This evidence is also provided by finding RSDPWs of blackbrush and sagebrush with 
respect to the reference signature d specified by red soil. 

RSDPW = blackbrush, s^. = sagebrush;d = red soil) = 1.1126 (2.36) 
RSDPW gp(s. = blackbrush, s^. = sagebrush;d = red soil) = 1.1106 (2.37) 

RSDPW 3 jp(s. = blackbrush, s^. - sagebrush; d = red soil) = 1.2576 (2.38) 

RSDPW^j^j^ (s. = blackbrush, s. = sagebrush;d = red soil) = 1.1468 (2.39) 

which also show how close the signatures of blackbrush and sagebrush are and is smaller 
than 1.1106, 1.1126 and 1.2576 produced by ED, SAM and SID respectively. 

Tables 2.7-2.10 tabulate the RSDPWs of ED, SAM, SID and HMMID respectively 
with red soil used as a reference signature d where Fig. 2.6 plots their corresponding 
values. 



Table 2J, RSDPW of ED using red soil as a reference signature d 





blackbrush 


creosote leaves 


drygrass 


sagebrush 


blackbrush 


1 


1.3984 


1.8533 


1.1106 


creosote leaves 




1 


2.5917 


1.2591 


dry grass 
sagebrush 






1 


2.0584 

1 



Table 2.8. RSDPW^o£SAMusin^ 



Blackbrush 


creosote leaves 


drygrass 


sagebrush 


blackbrush 1 


1.4081 


1.8623 


1.1126 


creosote leaves 


1 


2.6223 


1.2656 


drygrass 




1 


2.0721 


sagebrus 






1 


Table 2.9. RSDPW of SID using red soil as a 


reference signature d 


blackbrush 


creosote leaves 


drygrass 


sagebrush 


blackbrush 1 


2.2321 


2.9078 


1.2574 


creosote leaves 


1 


6.4906 


1.7752 


drygrass 




1 


3.6562 


sagebrush 






1 



Table 2.10. RSDPW^ofHM]^^ 





blackbrush 


creosote leaves 


drygrass 


sagebrush 


blackbrush 


1 


1.7185 


3.4665 


1.1468 


creosote leaves 




1 


5.9571 


1.9708 


drygrass 






1 


3.0227 


sagebrush 








1 
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1: B-C,2: B-D, 3: B-S, 4: C-D, 5; C-S, 6: D-S 



Figure 2.6. Plots of Tables 2.7-2. 10 



From Fig. 2.6 it is obvious that the stochastic information measures, HMMID and 
SID were more effective than deterministic measures ED and SAM. In order to evaluate 
which measure is more effective in terms of spectral discriminability, a mixed spectral 
signature was randomly generated to be used as a target signature, t for identification. It 
was composed of 0.1055 blackbrush, 0.0292 creosote leaves, 0.0272 dry grass, 0.7588 
red soil and 0.0974 sagebrush. It should be noted that the t was generated randomly, not 
intended for a particular preference. From Tables 2. 1-2.4, the spectrum of red soil is very 
similar to that of dry grass. Using (2.33) Table 2.1 1 tabulates the RSDPB of ED, SAM, 
SID and HMMID where Fig. 2.7 also plots the values of Table 2.1 1. 



Table 2.11. RSDPB produced by ED, SAM, SID and HMMID with t chosen to be a mixture of 0.1055 



blackbrush, 0.0292 creosote leaves, 0.0272 dry grass, 0.7588 red soil and 0.0974 sagebrush 





blackbrush 


creosote leaves 


drygrass 


redsoil 


sagebrush 


ED 


0.2215 


0.3417 


0.1049 


0.0773 


0.2547 


SAM 


0.2212 


0.3430 


0.1044 


0.0769 


0.2546 


SID 


0.1897 


0.4933 


0.0588 


0.0112 


0.2500 


HMMID 


0.2395 


0.4911 


0.0511 


0.0036 


0.2147 




1 2 3 4 5 

1 -blackbrush, 2-creosote, 3-drygrass, 4-redsoil, 5-sagebrush 



Figure 2.7. Plot of Table 2.11 



According to Table 2.11, the ratio of using ED and SAM to identify t as red soil to as 
dry grass was 0.1049:0.0773 = 0.1044:0.0769 -1.36. Compared to ED and SAM, SID 
and HMMID yielded 0.0588:0.0112 = 5.25 and 0.0511:0.0036 = 14.19 respectively. 
A similar observation from Fig. 2.7 also shows that SID and HMMID were more 
effective than other three measures in identifying t as red soil with nearly zero value of 
spectral discriminatory probability at red soil. Table 2.12 calculated RSDE of ED, SAM, 
SID and HMMID using Table 2.1 1 where HMMID produced the least entropy with Table 
2.13 produced by the RSDE results of HMMID using different numbers of states. 
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Table 2.12. RSDE of Table 2.11 


produced by ED, SAM, 


SID and HMMID 


ED 


SAM 


SID 


HMMID 


RSDE 1.4835 


1.4822 


1.2274 


1.1940 



Table 2.13. RSDEs resulting from different number of states 



# of states 


3 


4 5 


6 


7 


8 


9 


10 


RSDE 


1.1940 


1.1939 1.1938 


1.1936 


1.1932 


1.1943 


1.1941 


1.1941 



It was found that RSDE decreased slightly as the number of states increased. 
However, a high number of states generally increases computational complexity of 
parameter estimation process significantly. So, according to our experiments, a 
reasonable number of states ranges from 4 to 6 that allows us to achieve good 
compromise. 

2.4.2 HYDICE Data 

Unlike the AVIRIS data studied in the previous section, the HYDICE data used in 
the following experiments were directly extracted from a HYDICE image scene of size 
64 X 64 shown in Fig. 1.6(a). The spectra of PI, P2, P3, P4 and P5 in Fig. 1.7(c) form 
a data set A = {P1,P2,P3,P4,P5} that would be used for panel identification. Tables 
2.14-2.17 tabulate the similarity values resulting from ED, SAM, SID and HMMID. 



Table 2.14. Similarity values produced by ED among the five panel signatures in Fig. 2.5 


PI 


P2 


P3 


P4 


P5 


PI 0 


1301.6 


2033.3 


4107.3 


4831.6 


P2 


0 


1340.4 


5064.1 


5733.0 


P3 




0 


5434.1 


5968.7 


P4 






0 


1125.4 


P5 








0 


Table 2.15. Similarity values produced by SAM among the five panel signatures in Fig. 2.5 


PI 


P2 


P3 


P4 


P5 


PI 0 


0.0435 


0.0673 


0.1144 


0.1240 


P2 


0 


0.0430 


0.1479 


0.1567 


P3 




0 


0.1652 


0.1710 


P4 






0 


0.0248 


P5 








0 


Table 2.16. Similarity values produced by SID among the five panel signatures in Fig. 2.5 


PI 


P2 


P3 


P4 


P5 


PI 0 


0.0039 


0.0086 


0.0233 


0.0313 


P2 


0 


0.0033 


0.0385 


0.0484 


P3 




0 


0.0476 


0.0570 


P4 






0 


0.0025 


P5 








0 


Table 2.17. Similarity values produced by HMMID among the five panel signatures in Fig. 2. 


PI 


P2 


P3 


P4 


P5 



PI 

P2 

P3 

P4 

P5 



0.2935 

0.2891 

0.3590 

0 



0.4798 

0.4483 

0.5483 

0.0186 

0 



0 



0.0255 

0 



0.0291 

0.0215 

0 
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It is interesting to note that unlike Tables 23-2.4 which show that ED and SAM 
generated very close values, the values generated by ED and SAM in Tables 2.14-2.15 
are very different. However, if the five panel signatures were normalized, the results 
produced by ED and SAM turned out to be very close. In analogy with Fig. 2.5, Fig. 
2.8 also plots similarity values of the four measures in Tables 2.14-2.17 for comparison. 




6: P2-P4, 7: P2-P5, 8: P3-P4, 9: P3-P5, 10: P4-P5 
Figure 2.8. Plots of Tables 2.14-2.17 

The above HYDICE experiments also showed that ED and SAM performed very 
similarly because of (2.24). Tables 2.14-2.17 and Fig. 2.8 show that the signatures of 
PI, P2 and P3 are very close. Similarly, both P4 and P5 have very close signatures, but 
are very distinct from those of PI, P2 and P3. 

Tables 2.18-2.21 tabulate RSDPW values for ED, SAM, SID and HMMID where 
P2 was used as a reference signature d. These tables also demonstrate that SID and 
HMMID outperformed than ED and SAM in terms of RSDPW and HMMID performed 
better than SID. 



Table 2.18. RSDPW of ED using P2 as a reference signature d 





PI 


P3 


P4 


P5 


PI 


1 


1.0298 


3.8907 


4.4046 


P3 




1 


3.7781 


4.2771 


P4 






1 


1.1321 


P5 








1 



Table 2.19. RSDPW of SAM using P2 as a reference signature d 





PI 


P3 P4 


P5 


PI 


1 


1.0116 3.4000 


3.6023 


P3 




1 3.4395 


3.6442 


P4 




1 


1.0595 


P5 






1 



Table 2.20. RSOTW^ofSIDjisin^JP^^ 





PI 


P3 


P4 


P5 


PI 


1 


1.1818 


9.8718 


12.4103 


P3 




1 


11.6667 


14.6667 


P4 






1 


1.2571 



P5 1 
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PI 


P3 


P4 


P5 


PI 


1 


1.1860 


11.3373 


17.5804 


P3 




1 


13.4465 


20.8512 


P4 






1 


1.5507 


P5 








1 



From Tables 2.18-2.21 we also plot their RSDPW values in Fig. 2.9. Like the 
AVIRIS experiments, the stochastic information measures, HMMID and SID performed 
more effectively than deterministic measures ED and SAM. 




1: P1-P3, 2: P1-P4, 3: P1-P5, 4: P3-P4, 5: P3-P5, 6: P4-P5 

Figure 2.9. Plots of Tables 18-2.21 

In order to evaluate the RSDPB, a target pixel vector randomly extracted from the 
white mask of was chosen as t for identification. It was based on the fact that P2 is 
very close to both PI and P3, but more closer to P3. This selection enables us to 
evaluate the effectiveness of RSDPB. The pixel vector t was a panel edge pixel vector 
mixed with the background grass signature. Table 2.22 tabulates its discriminatory 
probabilities against five panel signatures using ED, SAM, SID and HMMID where Fig. 
2.10 plots their RSDPB values in Table 2.22. 



Table 2.22. RSDPB values produced by ED, SAM, SID and HMMID with t chosen from a pixel vector in 



the white mask of p^^ 





PI 


P2 


P3 


P4 


P5 


ED 


0.1530 


0.1339 


0.1578 


0.2631 


0.2922 


SAM 


0.1544 


0.1108 


0.1482 


0.2837 


0.3028 


SID 


0.1029 


0.0520 


0.0813 


0.3419 


0.4218 


HMMID 


0.0994 


0.0419 


0.0939 


0.3680 


0.3968 



As we can see from Table 2.22, the RSDPBs among PI, P2 and P3 using ED and 
SAM were very close, but both SID and HMMID were clearly better than ED and SAM. 
This fact can be better demonstrated in Fig. 2.10, where SID and HMMID produced 
lowest discriminatory probability of t against P2. 
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Figure 2.10. Plot of Table 2.22 

If we further calculate the ratio of the second smallest RSDPB to the smallest 
RSDPB for ED, SAM, SID and HMMID respectively, 

p®(P3):p®(P2) = 0.1530:0.1339 = 1.14, (2.32) 

A.r(P3):PtT(P2) = 0.1482:0.1108 = 1.34, (2.33) 

p,^®(P3):pf“(P2) = 0.0813:0.0520 = 1.56 (2.34) 

P™"”D(P3): p™™(P2) = 0.0939:0.0419 = 2.24 (2.35) 

HMMID was about twice as effective as ED, SAM, and SID to identify t as P2. 

Table^^2^23JRSDRofTabl£2^^^ 

ED SAM SID HMMID 

1.5586 1.5344 1.3230 1.3190 



Table 2.23 tabulates their respective RSDEs. Once again, HMMID produced the 
least entropy. More experiments can be found in Chang (2000), Du and Chang (2001) 
and Du (2000). 



2.5 CONCLUSIONS 

This chapter presents two new information theoretic hyperspectral measures, spectral 
information measure (SIM) and hidden Markov model (HMM)-based measure for spectral 
characterization. Both measures use self-information to characterize spectral variability, 
similarity, and discrimination for hyperspectral image analysis. SIM considers a 
hyperspectral image pixel vector as a random variable so that the spectral variability of 
the pixel vector can be more effectively described by randomness in nature. The Hidden 
Markov Model (HMM) is introduced to model the unobserved and hidden spectral 
properties of a hyperspectral image pixel vector as a Markov random process. With these 
interpretations, a SIM-derived Spectral Information Divergence (SID) and an HMM- 
derived Information Divergence (HMMID) are further developed to measure the spectral 
similarity between two pixel vectors. In order to evaluate the effectiveness of a 
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hyperspectral measure, a new definition of spectral discriminatory power is suggested for 
performance analysis. Finally, the concept of spectral discriminatory probability vector is 
also developed for the purpose of material identification. Experiments have demonstrated 
that SID and HMMID capture spectral characteristics more effectively than do commonly 
used spectral similarity measures, Euclidean distance (ED), spectral angle mapper (SAM). 
This is because both SID and HMMID are statistical measures as opposed to ED, SAM, 
which are considered to be deterministic measures. In particular, the HMM can be used to 
specify complicated hidden spectral properties that cannot be observed in the spectrum of 
a hyperspectral pixel vector. As the number of spectral channels grows, the uncertainty 
and unpredictability in spectral analysis increase. Under this circumstance, SID and 
HMMID have better ability than do ED, SAM in characterizing spectral properties. This 
has been demonstrated in the experiments. The disadvantage of HMMID is the 
complexity in the implementation of HMM where the parameter vector A =(A,B, :7r) 
used in HMM must be estimated before it is used. As a final note, a discrete HMM- 
based spectral measure was also developed in (Du and Chang, 1999; Du, 2000) which 
was derived from the continuous HMM described in Section 2.1.2 by uniformly 
quantizing the spectrum of each pixel vector into a finite number of discrete values. The 
quantization used in the discrete HMM can simply analysis, but may lead to loss of 
information since an unsupervised clustering technique such as C-means (or K-means) 
method must be used to minimize the information loss and capture the major features of 
a pixel vector. 




II 



SUBPIXEL DETECTION 



Target detection in remotely sensed images can be conducted spatially, spectrally or 
both. The difficulty with spatial analysis-based target detection arises from the fact that 
the ground sampling distance (GSD) is generally larger than the size of targets of interest, 
in which case targets are embedded in a single pixel and cannot be detected spatially. 
Under this circumstance target detection must be carried out at subpixel level and spectral 
analysis offers a valuable alternative. Since a hyperspectral image is generally represented 
by an image cube, an image pixel r is typically an T-dimensional column vector where L 
is the number of spectral channels used for data acquisition. As a hyperspectral image 
vector, r can be completely characterized by two features, its vector length and vector 
direction. In the light of this interpretation, the commonly used linear spectral mixture 
analysis (LSMA) basically deals with the vector length of r whose projection along each 
coordinate corresponds to the abundance fraction of a particular spectral channel. When 
there are no constraints imposed on the pixel vector r, the LSMA is referred to as 
unconstrained LSMA. Many algorithms have been developed along this line over the 
past years. If the sum of the projections is constrained to one, LSMA becomes sum-to- 
one constrained LSMA, which seeks solutions in an L-dimensional surface of a polygon 
(Settle and Drake, 1993, Ashton and Schaum, 1998). On the other hand, if each of the 
projections is constrained to being nonnegative, the LSMA is referred to as nonnegativity 
constrained LSMA, which finds solutions in a nonnegative region in the L-dimensional 
space (Lawson and Hanson, 1995; Bro and Jong, 1997). If both sum-to-one and 
nonnegativity constraints imposed on the projections, the resulting LSMA is called fully 
constrained LSMA. In this case, it searches for solutions in an Z-dimensional unit 
convex hull in a nonnegative region (Shimabukuro and Smith, 1991; Settle and Drake, 
1993; Ashton and Schaum, 1998; Heinz and Chang, 1999b; Heinz and Chang, 2001). A 
similar approach to convex hull, called convex cones analysis was also recently 
investigated in Ifarraguerri and Chang (1999). It should be noted that in the fully 
constrained LSMA the vector length of the projections is actually the city block distance 
specified by (2.19). Interestingly, the vector direction of an image pixel is also as 
important as its vector length. Finding a vector direction of interest has been studied 
extensively in passive sensor array processing where the arrival of signals from desired 
directions is of major interest. It seems that such a concept has not been explored in 
LSMA. 
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PART II: SUBPIXEL DETECTION 



In PART II, the problem of subpixel target detection in remote sensing images is 
considered, where the two constrained target detection approaches described above arc 
studied and compared. One is a target abundance-constrained approach, which can be 
viewed as vector length-constrained technique and another is a target signature- 
constrained approach, which can be considered as a vector direction-constrained 
technique. The former uses the target signature vectors used in LSMA as coordinates 
along which the projections of an image vector r are considered as target abundance 
fractions. It is based on the principle of orthogonality and least-squares error from which 
several detection algorithms can be derived, unconstrained orthogonal subspace projection 
(OSP) method, sum-to-one constrained least squares (SCLS) method and nonnegatively 
constrained least squares (NCLS) method which will be discussed in Chapter 3. The 
latter is a linearly constrained minimum variance (LCMV) approach, which will be 
studied in Chapter 4. It makes use of a linear constraint to lock on directions of target 
pixel vectors of interest while minimizing the energy of signal sources coming from 
other directions. In this case, two pixel vectors pointing to close directions will be 
considered as similar. This concept is similar to the spectral angle mapper (SAM) 
specified by (2.12) in Chapter 2 to measure spectral similarity. Two special versions of 
LCMV, constrained energy minimization (CEM) method and target-constrained 
inference-minimized filter (TCIMF) will be also introduced in Chapter 4. By means of a 
QR-decomposition, both CEM and TCIMF can be processed in real time to achieve 
computational efficiency. Despite their difference in algorithm design, both target 
abundance-constrained approach and target signature-constrained approach arrive at a same 
functional form of a matched filter using different levels of information that arc 
determined by the spectral information used in the LSMA and the sample spectral 
information used in LCMV. Their relationship will be also explored in this chapter. In 
order to extend the methods in Chapters 3 and 4 to automatic subpixel detection, two 
types of automatic subpixel detection are considered, unsupervised subpixel detection in 
Chapter 5 and anomaly detection in Chapter 6. Three unsupervised learning algorithms 
are presented for unsupervised subpixel detection, unsupervised vector quantization 
algorithm, unsupervised target generation process and unsupervised NCLS to generate the 
spectral information directly from the image data in an unsupervised manner. For 
anomaly detection, several anomaly detectors are investigated, which include the RX 
detector (Reed and Yu, 1990), the low probability target detector (Harsanyi, 1993) and 
their variants. Since anomalous targets are generally small compared to their 
surroundings, the line-by-line real-time implementation considered in Chapter 4 is further 
extended to the pixel-by-pixel real time processing. Finally, Chapter 7 conducts an 
analysis on issues of sensitivity to target knowledge and noise, which have significant 
impact on subpixel detection performance. 
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TARGET ABUNDANCE-CONSTRAINED SUBPIXEL 
DETECTION: PARTIALLY CONSTRAINED 
LEAST-SQUARES METHODS 



Subpixel target detection has received considerable interest in remote sensing image 
processing in recent years (Sabol et al, 1992). Due to significantly improved spectral 
resolution by recent advances of remote sensing instruments, imaging spectrometers such 
as AVIRIS and HYDICE sensors can now uncover and extract targets smaller than the 
pixel spatial resolution, in which case targets are generally embedded in a single pixel 
and cannot be detected spatially. As a result, traditional spatial-based image processing 
techniques are not directly applicable. In order to resolve this problem, we must rely on 
their spectral properties to detect these targets at subpixel level. In this chapter, linear 
spectral mixture analysis (LSMA) is investigated for subpixel target detection. In 
particular, we consider the target abundance-constrained subpixel detection that imposes 
different partial constraints on the abundance fractions of the target signatures used in the 
LSMA. Two partially abundance-constrained methods, referred to as sum-to-one 
constrained least-squares (SCLS) and nonnegatively constrained least-squares (NCLS), 
will be presented in this chapter. SCLS constrains on the abundance fractions of the 
target signatures summed to one with no constraint on nonnegativity of the abundance 
fractions. On the other hand, NCLS requires the abundance fractions of the target 
signatures to be nonnegative, but discards the abundance sum-to-one constraint. 



3.1 INTRODUCTION 

The LSMA has been widely used for image endmember classification, which will be 
discussed in Chapter 8 in great detail. It can be briefly described as follows. Suppose that 
there are p targets, t,, ••*, present in an image scene. Let m^, denote 

their respective spectral signatures (i.e., spectra) and r be a hyperspectral image pixel 
vector. LSMA assumes that the spectral signature or spectrum of r can be represented by 
a linear mixture of nij, m^, •••, with appropriate abundance fractions specified by 
a, , , • • • , . In general, two constraints should be imposed on this model to produce 

a desired solution. These are (a) abundance sum-to-one constraint, referred to as the ASC, 
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that is, a. = 1 and (b) abundance nonnegativity constraint, referred to as the ANC, 
that is, a. >0 for all l< j < p . An LSMA imposing these two constraints has been 

studied in the past (Shimabukuro and Smith, 1991; Settle and Drake, 1993; Ashton and 
Schaum, 1998; Heinz and Chang, 1999b; Heinz and Chang, 2001). It can be used for 
quantification of materials present in any pixel vector r where the abundance fractions 
tZj, can be estimated accurately to reflect their true abundance fractions. Such 

material quantification cannot be accomplished by many unconstrained or partially 
constrained LSMA methods. However, from a target detection point of view, whether or 
not the estimated amount of a target abundance fraction is accurate may not be essential. 
As long as the estimated abundance fractions of the desired target pixel vectors can 
distinguish themselves from their surrounding pixel vectors, the targets can be detected 
effectively even if the abundance fractions of LSMA do not satisfy ASC or ANC. Such 
target detectability was demonstrated through a family of unconstrained orthogonal 
subspace projection (OSP)-based methods (Harsanyi and Chang, 1994; Chang et al., 
1998b). Therefore, in some cases, an LSMA-based fully constrained least-squares method 
(FCLS) may not be as effective as unconstrained methods or partially constrained in 
terms of target detection (Heinz and Chang, 2001). This is because a fully constrained 
method requires both ASC and ANC to be satisfied where the estimated abundance 
fractions must be confined to the range of [0,1] which limits its target detection 
capability. 

Two LSMA-based partially constrained least-squares methods have been considered 
in the past, sum-to-one constrained least-squares method (SCLS) (Settle and Drake, 
1993, Ashton and Schaum, 1998) and nonnegativity constrained least-squares method 
(NCLS) (Lawson and Hanson, 1995; Bro and Jong, 1997). SCLS imposes ASC while 
ignoring ANC. On the contrary, NCLS implements ANC while discarding ASC. As a 
result, both methods generally do not estimate target abundance fractions accurately. 
Nevertheless, their estimated abundance fractions are generally more accurate than those 
estimated by unconstrained method and can be still used for target detection. Since 
SCLS-generated abundance fractions must be summed to one, the magnitudes of the 
SCLS-estimated target abundance fractions are usually spread out in the range of [0,1]. 
Accordingly, these abundance fractions will be relatively small in order to satisfy to the 
sum-to-one constraint. It is particularly true when an image scene contains many target 
signatures. Therefore, the target detectability is substantially reduced in such cases. This 
situation becomes even worse if the spectra of targets are very similar. On the other hand, 
NCLS-generated abundance fractions do not have this problem. Being free of ASC, the 
NCLS can use whatever values it generates for abundance fractions. Despite the fact that 
its estimated abundance fractions may not reflect accurate abundance fractions, the target 
detectability resulting from NCLS may be actually benefited from not satisfying ASC. 
As a consequence, NCLS performs than SCLS in target detection. 



3.2 LINEAR SPECTRAL MIXTURE MODEL 

Suppose thatL is the number of spectral bands and tj, t,, are/? targets present 
in an image scene. Let nij, m,, •••, denote their corresponding target signature 
vectors, which are generally referred to as digital numbers (DN). A linear spectral mixture 
of r is a linear combination of m,, with appropriate abundance fractions 
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specified by , a, , ■ • • , . More precisely, let r and M be an L x 1 column pixel vector 

and an L X p target spectral signature matrix, denoted by m, m^] respectively, 

where m^. is an L x 1 column vector for 1 < j < p . Also a = {a^ - a ^ be a 

p X 1 abundance column vector associated with r with a. denoting the abundance 
fraction of the y-th ' target signature m^. present in the pixel vector r. We can model the 
spectral signature of an image pixel r by a linear regression form 

r = Ma + n (3.1) 

where n is noise or can be interpreted as a measurement error or a model error. Here 
without confusion the r is used to indicate either an image pixel or its spectral signature 
vector interchangeably throughout the book. A classical approach is linear unmixing 
which finds a set of appropriate abundance fractions, a,, ••• of the p targets, 
t, , , • • •, tp that are contained in the pixel vector r. 

3.3 ORTHOGONAL SUBSPACE PROJECTION (OSP) 

In order to solve a = , an unconstrained orthogonal subspace 

projection (OSP) approach was recently proposed by Harsanyi and Chang (1994). It 
assumed that there was a desired target, say among the p targets, , ■ ■ • , and the 
remaining tj, t^, ^ were undesired targets. Since the desired target was the only 

target of interest, all other targets will be considered as interferers to t^. In this case, the 
OSP approach eliminated the interfering effects caused by these undesired targets 
tj, t,, j before the detection of took place. As a consequence of annihilation of 
tj, the detectability of was improved. The main idea of the OSP approach 

can be described mathematically as follows. 

It first separates from t,, t^, • ••, ^ and rewrites (3.1) as 

r = do:^+UY + n (3.2) 

where d = is denoted as the desired spectral signature of and 

U = [nij nij ••• J is the undesired target spectral signature matrix made up of the 
remaining p-1 undesired spectral signatures t,, •••, j. Here, without loss of 
generality we assume that the desired target consists of a single target t . Nevertheless, it 
can be extended to multiple targets as discussed in the TCIMF approach in Chapter 4. 

Separating U from d allows us to design an orthogonal subspace projector to 
annihilate U from the pixel vector r before the detection of takes place. One such 

desired orthogonal subspace projector was derived in Harsanyi and Chang (1994) and 
given by 
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(3.3) 

where = (U^U)~‘U^ is the pseudo-inverse of U and the Z,-dimensional 

identity matrix. The notation “ ” in indicates that the projector maps the image 

pixel vector r into < U the space orthogonal to < U >. 

Applying P^ to (3.2) leads to an undesired target-eliminated spectral mixture model 

= (3.4) 

where the undesired signatures in U have been eliminated and the original noise n has 
been suppressed to F^n . 

Equation (3 .4) represents a standard signal detection problem. If we assume that a 
detector solving (3.4) is a linear filter specified by a filter vector w, the filter output is 
then given by 



= + (3.5) 

A commonly used criterion to measure detection performance specified by (3.5) is the 
signal-to-noise ratio (SNR), which is defined by the ratio of output signal energy to 
noise output energy. According to (3.5) the output signal and noise energies can be 
obtained by the variances of w^P^da^ and w^F^^n respectively. So, the SNR resulting 
from (3.5) is given by 



w^F^F[nn"]F> ^ 



(3.6) 



where the noise is assumed to white with variance given by C7“ and P^ is idempotent, 

i.e., (f^ ) = P^. An optimal detector for (3.4) can be obtained by maximizing the SNR 
specified by (3.6) over the filter vector w. By virtue of Schwarz's inequality, 
|w^(F^{-d)| < ||w||||Fud|| with equality if and only if w = K P^d for some constant K 
where ||.|| is the Euclidean distance defined in (2.20). So, the desired optimal filter 
vector w" can be found by w* = KP^d. Additionally, it is known that the maximum 
SNR, that is, max^ SNR(h) in (3.6) is the largest eigenvalue, of the matrix 
|F^dj(Fyd) in the second equality of (3.6) (Stark and Woods, 2002). Since P^d is a 
vector, the rank of the matrix, (F^|d)(F^Jd) is one. Therefore, there is only one nonzero 
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eigenvalue, which is As also shown by Schwarz’s inequality, w* = is an 

optimal solution to (3.6), in which case the eigenvector corresponding to must be of 
the form w* = KP^d.lfwe further define a matched filter using the filter vector w 
as the matched signal by 

M^(x) = for some constant K, (3.7) 



the optimal detector solving (3.4) turns out to be with matched signal specified by 
w = K P^d . The maximum of SNR(w) is then given by and is equal to A^^, that is, 

max^ SNR(w) = SKR(w*) = A^^ = / a) d^P^J'd . (3.8) 

Based on the above approach with the SNR criterion defined by (3.6), an OSP detector 
denoted by (r) , can be derived by a linear matched filter 






(3.9) 



which operates r with the matched signal given by = P^J^d with k: = 1 in (3.7). 

An alternative approach proposed in Harsanyi and Chang (1994) and Harsanyi (1993) 
is to operate on (3.4) with x to maximize SNR. The resulting SNR is given by 



SNR(x) = 



\ P^E[nn ]P^x 



(3.10) 



Now maximizing (3.10) is equivalent to solving the following generalized eigenvalue 
problem for X (see Theorem 5.5-1 in Stark and Woods (2002)), 

^ X- (311) 

Since the detection problem specified by (3.4) represents a two-class classification 
problem, the rank of the matrix, [cr'^u] [^ijd^^pd^P^j on the left of (3.11) is one. 
This implies that the only nonzero eigenvalue is the maximum eigenvalue A^^. 
Interestingly, A = d^P^d 0 and x = Kd are a solution that satisfies (3.11). 

Substituting / cr^ d^P^^^d for X and Kd for x in (3.1 1) results in an equation identical 

to (3.8). This shows that the solution to (3.4) using (3.10) as a criterion is also a 
matched filter, with the matched signal w which turns out to be the desired 
signature vector d and is specified by the eigenvector corresponding to 
“ {^p ^ d^P^^d. As a result, the linear optimal filter that maximizes (3.10) for 

(3.4) is exactly the OSP detector, <5Qgp(r) specified by (3.9). 
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The ^ogp(r) defined by (3.9) is implemented by an undesired signature annihilator 
followed by a matched filter, . More precisely, if we want to detect a target 
signature d in a mixed pixel vector r, we first apply to (3.2) to eliminate U, then use 
the matched filter to extract the d from (3.4). 

It should be noted that Harsanyi and Chang’s OSP detector <^ogp(r) did not estimate 
the abundance vector a in (3.1). In reality, a is generally not known and needs to be 
estimated. In order to estimate a = tiirectly from the image data, several 

least-squares based techniques were developed (Tu et al., 1998; Chang et al., 1998b). It 
is known that the least-squares estimate of a = [a^ (3*1)’ denoted by 

6l^ = {a^ , can be obtained by (Scharf, 1992) 

=(M^M)"^M^r. (3.12) 

It should be noted that d^ is a function of r. To simplify our notation r is not included 
in d^ to indicate the dependency of d^ on r. So, an LS detector, denoted by d^(r), 
can be derived from (3.9) and (3.12) as 

5„(r)=(d^P>)‘'<5„3,(r). (3.13) 

The detection of the desired target specified by the spectral signature d can be 
accomplished via d^(r) by finding the estimated abundance fraction d^ contained in 
d^. The LS detector d^(r) given by (3.13) was referred to as a posteriori OSP in 
Chang et al. (1998a) and LSOSP in Tu et al. (1997), whereas Harsanyi and Chang’s OSP 
was referred to as a priori OSP. More discussions on least-squares estimation will be 
given in Chapter 8. 

3.4 SUM-TO-ONE CONSTRAINED LEAST-SQUARES METHOD (SCLS) 

Equation (3.1) is a general linear mixture model with no constraints imposed on the 
abundance vector a = . So, it does not provide accurate estimates of 

abundance fractions and only offers an unconstrained solution. In this section, we 
consider a partially target abundance-constrained approach by imposing the ASC on (3.1) 
that results in the following SCLS linear mixing problem: 

min |(r - Ma)^(r - Ma)j subject to A = |a | = l|. (3.14) 

The solution, to (3.14) can be derived as 
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+ (m^m) ' l ' (3.15) 

where is given by (3.12) and 

[i^(m^m)"i] ’ V (3.16) 

with 1 = (11---1)^ being a jt^-dimensional column vector. Using (3.15), an SCLS 

p 

detector, denoted by <5s^Ls(r) can be obtained by 

5sc.(r) = • (3-17) 

Note that (3.15) was also derived in Settle and Drake (1997), Ashton and Schaum 
(1998). Like d^(r), the detection of the desired target specified by the spectral 
signature d can be achieved via (3.17) by finding the estimated abundance fraction 
contained in dg^^ . 



3.5 NONNEGATIVELY CONSTRAINED LEAST-SQUARES METHOD (NCLS) 

Because the SCLS described above is only based on the constraint cc. = 1 , its 
solution dg^j^g does not guarantee that the estimated abundance fractions are nonnegative, 
i.e., a. >0 for all 1< j < p. As an alternative, we consider to impose ANC (a. >0 
for each l< j < p) rather than ASC on (3.1). The resulting problem is referred to an 
NCLS problem. Due to the fact that the nonnegativity constraint consists a set of 
inequalities, no analytical solution can be derived for a closed form. Furthermore, since 
NCLS does not satisfy ASC, its estimated abundance fractions are generally not accurate. 
So, why is it important to consider NCLS? The truth is that NCLS may not be as good 
as a fully constrained method considered in Heinz and Chang (1999b, 2001) for material 
quantification. But, as a target detector, an NCLS-based detector without satisfying the 
ASC may be more effective than a fully constrained quantifier in target detection. This 
disadvantage turns out to be an advantage for NCLS in enhancement of target 
detectability. 

In general, an NCLS approach solves the following optimization problem 

Minimize LSE = (Ma - r)^(Ma - r) subject to o; > 0 (3.18) 

where LSE is the least-squares error used as a criterion for optimality and a > 0 
represents the nonnegativity constraint: a. >0 for all l< j < p . As noted, a >0 is a 

set of inequalities. The Lagrange multiplier method is not applicable to solving optimal 
solutions. In order to mitigate this dilemma, we introduce a ;?-dimensional unknown 
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positive constraint vector c = [c^c^ >0 with c. > 0 for 1 < J < p to take care of 

ANC. By means of c we can form a Lagrangian 7(a) as follows: 



J(a) = (l / 2)(Ma - r)’'(Ma - r) + - c) 



(3.19) 



subject to the constraint, a = c . Differentiating 7(a) with respect to a yields 



dJia) 
da a, 



= 0 - M"r + X = 0 



(3.20) 



which results in the following two iterative equations given by 

= (M’'M)'‘M"r - (m"m)''x = a,, - (m"m)“x (3.21) 

and 

X=M^(r-Ma,,,). (3.22) 

Equations (3.21) and (3.22) can be used iteratively to solve the optimal solution 
via the Lagrange multiplier vector ^ = (A^./l^,- ■*, A^)^. An NCLS detector, denoted by 
^NCLs (**) ’ derived by 

«5Ncs(r) = a«cL. (3-23) 



where the detection of the desired target specified by the spectral signature d is 
achieved by finding the estimated abundance fraction a^(r) contained in aj^^Ls- 

The nonnegativity constraint optimization problem given by (3.18) was previously 
explored by Lawson and Hanson (1995) and called nonnegative least-squares method 
(NNLS). Based on Lawson and Hanson's NNLS two Fast NNLS algorithms, referred to 
as FNNLS and FNNLSb were further developed by Bro and Jong (1996). Their idea was 
to first decompose the components of the estimate into two index sets, called active 
set, R, and passive set, P where R consists of all indices corresponding to negative (or 
zero) components in the estimate d^ and P contains all indices corresponding to 
positive components in the estimate d^ . The NNLS and the FNNLS started off with an 
empty passive set P = 0 and assumed the active set containing all components of d^, 
i.e., R = {l,2,---,p}. They then adjusted both sets P and R via iterations using (3.21)- 
(3.22). It has been shown in Lawson and Hanson (1995) that when an optimal solution 
was reached, the Lagrange multiplier vector X must satisfy 



X. =0 for j eP\ X. <0 for j E R. 



(3.24) 
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The final generated passive set identified which components were legitimate to be used in 
the abundance estimation, . 

Following a similar idea to Bro and Jong’s approach, the NCLS algorithm can be 
implemented as follows. 

NCLS Algorithm 

1 . Initialization; 

Set the passive set = {1,2, ••■,/?} and active set = 0. Set ^ = 0. 

2. Compute using (3.12). Let = d^ 

3. At the k-ih iteration, if all components in positive, the algorithm is 

terminated; otherwise, continue. 

4. LqI k<r-k^l. 

5. Move all indices in that correspond to negative components of ^ ^ • 

Let the resulting index sets be denoted by P^*^ and respectively. Create a new 
index set and set it equal to P^^\ 

6. Let d^,,, denote the vector consisting of all components d^ in P^^\ 

7. Form a steering matrix by deleting all rows and columns in the matrix d^,,, 
specified by 

8. Calculate = (^« ) - components in are negative, go to step 13; 

otherwise, continue. 

9. Calculate = max . and move its index in P^^^ to P^^^ 

max j j 

10. Form another matrix by deleting every column of specified by P^^\ 

11. Set d^„^ =d^ 

12. If any component of d^,,, in is negative, then move it from P^*^ to P^^^ and go 
to step 6. 

13. Form another matrix by deleting every column of specified by P^^\ 

14. Setd::,^,3=d,-T^r^^rGotostep3. 

To summarize the above procedure, we assume that the current iteration is k. The 
NCLS algorithm begins by calculating the unconstrained least-squares solution, d^. If 
all components in d^ are positive, the algorithm is terminated. Otherwise, all negative 
components are identified and their corresponding indices are moved to the active set 
P^*\ In the mean time, a duplicate set of P^*\ referred to as is introduced for the 
purpose of keeping track of the current negative components of during the A-th 

iteration. The steering matrix is then formed and the Lagrange multiplier vector 
that will be used to steer each negative component of calculated. 
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component of is shuffled from to Since the loop from step 6 to step 12 
may be repeated over and over again during a single iteration, is used to check if all 
previously identified indices of maximum components of should be retained in 
or moved back to Once all the values of are negative, is recalculated. It 

should be noted that the two iterative equations, specified by (3.21) and (3.22) are carried 
out by step 14 and step 8 respectively. 

Two comments on NCLS are noteworthy. The application of NCLS in subpixel 
target detection has been overlooked in the past because it requires the complete prior 
knowledge of targets present in an image and NCLS has been primarily used for 
unmixing abundance fractions of all targets rather than detecting a particular target in a 
linear mixture. As for quantification of abundance fractions, NCLS cannot compete 
against the FCLS in Heinz and Chang (1999b) and Heinz and Chang (2001). Another is 
that the NCLS algorithm has been shown to converge in all the experiments. However, 
on some rare occasions the NCLS algorithm could oscillate between two passive sets 
because it steers back to zero a passive set which usually contains more than one negative 
component during iteration. Should it occur, the NCLS algorithm would adjust only one 
component at a time at each iteration. More details about the analysis of the NCLS 
algorithm can be found in Heinz and Chang (1999a). 



3.6 HYPERSPECTRAL IMAGE EXPERIMENTS 

In this section, experiments using two image scenes in Figs. 1.6(a) and 1.7(a) ate 
conducted to evaluate OSP, SCLS and NCLS. A thorough comparative analysis 
including various scenarios of computer simulations will be postponed to Chapter 7 
where all the subpixel detection methods introduced in Chapters 3-6 will be studied in 
great detail. Figs. 3. 1-3. 3 show the results of applying OSP, SCLS and NCLS to the 
AVIRIS image in Fig. 1.6(a) where figures labeled by (a), (b), (c) and (d) show cinders, 
playa, rhyolite and vegetation as desired targets respectively and figures labeled by (e) are 
results of the shade. 




{A j li h j p :a ya i c } f . i » v , ' c ; iC-: 








The prior knowledge of the five signatures used in Figs. 3. 1-3.3 is obtained directly 
from the scene as was done in Harsanyi and Chang (1994). As we can see from these 
figures, NCLS outperformed SCLS and OSP while both OSP and SCLS performed 
similarly. It is interesting to note that OSP picked up an anomaly marked by a white 
circle in Fig. 3.1(b), which was not estimated by SCLS and NCLS. However, this 
anomaly was detected when NCLS was implemented in an unsupervised manner (Chang 
and Heinz, 2000a; Chang and Heinz, 2000b). This anomalous target will be discussed in 
Chapter 6. 

A second scene used for experiments is the 15-panel HYDICE scene shown in Fig. 
1.7. Figs. 3.4-3. 6 show the results of OSP, SCLS and NCLS in detection of the 15 
panels where the signature of panels used for target information in M were given by P 1 . 
P2, P3, P4 and P5 plotted in Fig. 1.8. 
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which case the amounts of the estimated abundance fractions were very small and were 
not good for detection. Such a situation will become even worse when the image 
background is complicated and many target signatures are present in the image. 



3.7 CONCLUSIONS 

In this chapter, least-squares based approaches to subpixel detection are studied. It 
first introduces an orthogonal subspace projection (OSP) which does not impose any 
constraint on target abundance fractions. It makes use of an orthogonal subspace projector 
to annihilate undesired target signatures prior to detection. Despite the fact that the used 
criteria of signal-to-noise ratio (SNR) in (3.6) and (3.10) assume that the noise is 
additive (i.e., uncorrelated) white, this assumption can be relaxed by a whitening process 
described in Section 12.3 of Chapter 12. Also, it is worth noting that unlike maximum 
likelihood approach (Settle, 1 996) OSP does not make an assumption that noise must be 
Gaussian. As shown in Du and Ren (2002) that OSP can be slightly improved if noise 
was appropriately whitened. However, how to reliably estimate noise correlation matrix 
is generally a challenging problem, which will be discussed in Section 17.3 of Chapter 
17. So, a common practice is to use OSP without noise whitening, since hyperspectral 
imagery generally has high signal-to-noise ratio (SNR) and the interference is more 
dominant than noise. Additionally, due to high spectral resolution of hyperspectral 
imagery the noise correlation is considered to be much less than it is in multispectral 
imagery. Therefore, such compromise does not have much impact on OSP performance. 

Although OSP is shown to perform well in general, its detection performance can be 
further improved by imposing constraints on target abundance fractions. Two partially 
constrained least-squares methods, SCLS and NCLS are investigated in this chapter. 
With the abundance sum-to-one constraint, SCLS subpixel detector can be solved 
analytically and expressed by an unconstrained least-squares solution plus a correction 
term. Since no closed- form solutions can be derived for NCLS due to the nonnegativity 
abundance constraint, a fast efficient numerical algorithm is developed to generate a 
desired optimal subpixel detector. The reason that only partial constraints are considered 
is because we are interested in subpixel target detection rather than quantification of target 
abundance fractions. To implement full constraints (i.e., ASC and ANC), the target 
abundance fractions must be constrained to the range of [0,1]. On many occasions such as 
complicated background where many targets may be present in an image, the estimated 
abundance fractions among targets may not differ very much for detection. This may 
result in poor target detection. On the other hand, if there is no constraint imposed on 
target abundance, the estimated abundance fractions may vary to a very large degree of 
deviation in a wide range. This may result in poor performance (Chang and Heinz, 
2000a; Chang and Heinz, 2000b). In either case, constrained subpixel detection may be 
useful. As demonstrated in experiments, partially constrained least-squares based 
subpixel detectors generally perform significantly better than the unconstrained subpixel 
detectors. However, as indicated previously, in some cases where many target signatures 
are present in an image scene, SCLS may not perform as well as unconstrained subpixel 
detectors in detection because of ASC. Under this circumstance, NCLS is always 
preferred to SCLS. 
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TARGET SIGNATURE-CONSTRAINED SUBPIXEL 
DETECTION: LINEARLY CONSTRAINED 
MINIMUM VARIANCE (LCMV) 



In Chapter 3, we studied partially constrained least squares approaches to target 
abundance-constrained subpixel detection. One of their practical limitations is the 
requirement of prior knowledge about the target signature matrix M. In reality, such 
information is very difficult to obtain, if not impossible. It is particularly true for 
hyperspectral imagery, which may contain many unknown signal sources extracted by 
high spectral resolution sensors such as HYDICE. These unidentified sources may 
include interferers, nonstationary background signatures and natural signatures that cannot 
be visually inspected from an image scene. Under such circumstance, finding a well- 
represented target signature matrix M may not be realistic. In order to resolve this 
problem, this chapter develops an alternative approach, referred to as a linearly 
constrained minimum variance (LCMV) approach, which only constrains targets of 
interest while minimizing the energy of unbiown signal sources. Two special versions 
can be derived from LCMV, which are constrained energy minimization (CEM) filter and 
target-constrained interference-minimized Filter (TCIMF) where TCIMF can be viewed as 
an extension of the CEM filter. Interestingly, there is a close relationship among CEM, 
TCIMF and OSP where all the three operate a same functional form as a matched filter 
using different levels of target knowledge. 



4.1 INTRODUCTION 

Hyperspectral image analysis has become increasingly important in remote sensing 
image processing because hyperspectral image sensors are now capable of extracting many 
subtle material substances which usually cannot be resolved by multispectral sensors. 
However, because of their very high spectral resolution, many unknown and unidentified 
signal sources (will be referred to as interferers here) may also be extracted as well. These 
unexpected signal sources generally introduce additional interfering effects on target 
detection. Such phenomena have been demonstrated in Chang et al. (1998a). Since the 
interference is generally unknown in nature and cannot be identified from an image scene 
by visual inspection, a challenging problem associated with it is how to minimize its 
interfering effect without actually identifying interferers. 



51 




52 



HYPERSPECTRAL IMAGING 



Over the past years many algorithms have been developed for hyperspectral image 
analysis. In particular, linear spectral mixture analysis discussed in Chapter 3 has been 
widely used for multispectral and hyperspectral image classification as well as for 
subpixel target detection. Unfortunately, it requires the complete knowledge of targets 
present in an image scene. In many practical applications, such knowledge including 
interferers is not known a priori and must be obtained directly from the scene. Three 
unsupervised methods were recently proposed in Chang et al. (2001b), which can be used 
to generate necessary interference information directly from an image scene. They will be 
discussed in depth in Chapter 5. One drawback of these approaches is that it needs to 
know how many interferers are needed to generate. 

In order to mitigate this dilemma, an alternative approach, called constrained energy 
minimization (CEM) was first proposed in Harsanyi (1993) and later published in 
Harsanyi et al. (1994). It did not require the knowledge of interferers. Instead, the only 
required knowledge was the desired target to be detected. Using a specific constraint, 
CEM designed an finite impulse response (FIR) filter to pass through the desired target 
while minimizing its output energy resulting from target sources other than the desired 
target. The success of the CEM filter in AVIRIS and HYDICE data exploitation was 
further demonstrated in Farrand and Harsanyi (1997) and Resmini et al. (1997). 

One disadvantage of the CEM filter is its single-target detection. If we had known 
that there were targets in an image scene which were not of our interest, e.g. natural and 
background signatures, these targets could have been eliminated prior to detection rather 
than being minimized as was done by the CEM filter. In this case, the CEM filter still 
considered the known undesired targets as unknown interferers. Thus, instead of 
eliminating these undesired targets, it minimized their energies. Therefore, CEM does 
not take advantage of preprocessing the undesired targets to enhance its detection 
performance. This can be actually accomplished by a target-constrained interference- 
minimized Filter (TCIMF) recently developed by Ren and Chang (2001). Coincidentally, 
the relationship between OSP and CEM can be further explored by TCIMF. 

Another disadvantage of CEM is that the CEM filter can only detect one target at a 
time and very sensitive to the target knowledge that is used in the constraint. If there are 
targets of same type with slight spectral variability, the CEM filter may miss these 
targets. This drawback can be remedied by a generalization of the CEM approach, referred 
to as linearly constrained minimum variance (LCMV) approach, which includes both 
CEM and TCIMF as special cases. 

The LCMV idea can be traced back to Frost's work (Frost lit, 1972) in adaptive 
beamforming. In passive sensor array processing, what we are interested is the desired 
direction of signal arrival from an array of sensors. If there is one direction of interest, we 
can lock on that direction by designing a constrained FIR filter with a specific filter gain 
to look for signals coming from this particular direction. When the gain is constrained to 
a single direction with unity, the resulting FIR filter is called the minimum variance 
distortionless response (MVDR) beamformer (Van Veen and Buckley, 1988; Haykin 
1994). Now, if we interpret an array of sensors as a bank of spectral channels used in 
remote sensing instruments and the desired direction of signal arrival as the vector 
direction of the desired target signature, the MVDR beamformer becomes the CEM filter. 
An early idea of CEM was explored in chemical remote sensing (Althouse and Chang, 
1991). More recently, an extension of the CEM filter to detection of multiple targets was 
also studied in Chang and Ren (1999) and Chang et al. (2001c). It used Frost's idea to 
linearly constrain a set of multiple targets so that these targets can be passed through a 
designed FIR filter constrained by a set of filter gains. In the mean time, the filter output 
energy resulting from target sources other than the desired directions is minimized in the 
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sense of variance. This can be achieved by augmenting the scalar unity used to constrain 
the CEM filter to a constraint vector, of which each component was used to constrain a 
specific target. 

In some applications, there is always partial knowledge available about the image 
data, in which case we should take advantage of rather than discard this information. 
Specifically, we can categorize signal sources in an image scene into three signal classes. 
One is a desired target class, which consists of targets of interest. A second signal class, 
called an undesired target class comprises of all targets that are known, but either 
unwanted or uninterested targets. A third signal class is an interference class made up of 
all unknown or unidentified signal sources resident in the image. An effective detector 
should best utilize its available target knowledge to achieve its optimal performance. In 
other words, it should detect the desired targets, eliminate the undesired targets, and 
minimize the effects caused by all unknown interferers simultaneously. The TCIMF 
developed in Ren and Chang (2001) accomplished these three tasks in one single 
operation where TCIMF divided Imown targets into desired targets and undesired targets 
to detect the desired targets and eliminate the undesired targets at the same time. This 
advantage cannot be achieved by CEM. Additionally, TCIMF also preserves the same 
strength of CEM, which does not need to identify interferers but still can minimize 
interference as the way carried out by CEM. In a way, TCIMF combines Frost's idea 
with the oblique subspace projection (Scharf, 1991; Chang et al. 1998b) in such a 
manner that the desired targets are projected into the range space while undesired targets 
being annihilated by mapping them into the null space. Specially, TCIMF accomplishes 
detection of the desired targets, annihilation of the undesired targets and minimization of 
interfering effects at the same time as opposed to CEM that only achieves detection of a 
desired target and minimization of interfering effects with no elimination of undesired 
targets. As a consequence, TCIMF generally performs better than CEM. From this point 
of view, CEM can be viewed as a special version of TCIMF. 

Since both CEM and TCIMF deal with an over-determined system where the 
number of pixel samples is generally greater than that of spectral bands, the rank of the 
sample correlation matrix that is used to design optimal weights for the filters will 
become crucial. This issue is determined by noise sensitivity and will be investigated in 
Chapter 7 in detail. Furthermore, it is also very closely related to the intrinsic 
dimensionality of the image data that will be discussed in Chapter 17. 



4.2 LCMV TARGET DETECTOR 

Assume that is a collection of A image pixels in a remotely sensed 

image where r, — l^i^N is an Z -dimensional pixel vector. As 

defined in Chapter 3, m,, m,, are the spectral signatures ofp targets t^, t,, 

present in the image and M is a target signature matrix formed by [nij j . The 

goal is to design a constrained FIR linear filter with by an Z-dimensional weight vector 
w = (wj, w,, ••• specified by a set of L filter coefficients that 

minimizes the filter output energy subject to the following constraint 



M^w = c where = c. for l< j < p 



(4.1) 
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where c - , c, , • • • , j is a constraint vector. 

Now, let y. denote the output of the designed FIR filter resulting from the input r.. 
Then y. can be expressed by 



E l t t 

= w r. = r. w. 



(4.2) 



Using (4.1)-(4.2) Frosf s LCMV beamformer becomes an LCMY-based target detector, 
which minimizes the average energy of the total filter outputs 



a / iV)[S",x'] = (1 / A?)[sr.,(r>)"r>] 

= / Af[Ef.,rr ])w = w 



(4.3) 



where =(1/ A')[Ef.ir r ] is the sample spectral matrix of the image. If we let 
X = [fj ‘ ] be the data matrix formed by {r^ , r, , • • • , }, then is also equal to 

Y)[xX^j. Combining (4.3) with the constraint equation (4.1) results in the 
following linearly constrained optimization problem 

min|w^R^^^wj subject to M^w = c. (4.4) 



The solution to (4.4) can be obtained by (Chang et al. 1999; Chang et al. 2001c) 

^LCMV ^ c . (4.5) 

So, an LCMV detector, (5^^j^^(r) can be derived by substituting for w in (4.2) 

and it is given by 



(r) = )' r = (RI' .M (m^R-' ,m) ' c)' r . (4.6) 

It should be noted that the optimal weight in (4.6) in the designed LCMV- 

based target detector uses the sample spectral correlation matrix R^^^ to capture the 
statistics of the first two orders among the image pixel vectors Fj , r, , • • • , . Two special 

cases of (4.1) are of particular interest. 

4.2.1 ConstFained EncFgy Minimization (CEM) 

If we are only interested in a single target signature d, that is, M = d , the constraint 
vector c in (4.1) is then substituted by d^w = = 1 and becomes a constraint 

unity scalar. In this particular case, (4.4) is reduced to 
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wj subject to d^w = 1 



(4.7) 



with the optimal solution given by 



CEM 

w 






(4.8) 



Substituting the optimal weight for w in (4.2) results in the CEM filter, which 
implements a detector, given by 



<?ceM(r) = (w““)^r = (d^R;‘ ,d)"(R-l,d)'r. (4.9) 

The approach to solving (4.7) via (4.8) is referred to as CEM in Harsanyi's dissertation 
(1993). 

4.2.2 Target-Constrained Interference-Minimized Filter (TCIMF) 

It has been demonstrated that interference played a significant role in hyperspectral 
image analysis. This is primarily due to the fact that many subtle materials can be 
revealed by hyperspectral image sensors with very fine spatial and spectral resolution. 
Unfortunately, this also results in extraction of additional unknown signal sources. A 
recently developed target-constrained interference-minimized filter (TCIMF) assumed that 
an image scene was made up of three separate signal sources, D (desired targets), U 
(undesired targets) and I (interference). The idea to separate interference from a signal 
model as an independent and separate source was previously explored in Chang et al. 
(1998a) and Chang and Du (1999). The CEM filter takes care of interference problem 
using (4.7) by constraining the desired target signature d while minimizing the output 
energy resulting from other target sources. However, in case that there is information 
available about the undesired target signatures in U, simply discarding this information 
may not be the best way to achieve optimal detection performance. TCIMF resolves this 
problem. It implements a vector that simultaneously constrains D and U in such a way 
that the desired target signatures in D can be detected, while the undesired target 
signatures in U can be eliminated at the same time. A similar LCMV-based filter was 
also proposed to extend CEM to detect multiple desired target signatures through a 
constraint vector (Chang and Ren, 1999). But, the used constraint vector was designed 
only for the purpose of multiple-target detection, not for annihilation of undesired target 
signatures in U as was done in TCIMF to enhance the target detectability. The idea of 
TCIMF can be briefly described as follows. 

Let D = [^d, --'dp] and U = j denote the desired target signature 

matrix and the undesired target signature matrix respectively. A constraint vector can be 
derived from (4.1) by replacing the target signature matrix M and the constraint vector c 
with the desired-undesired target signature matrix [du] and the desired-undesired target 

signature constraint vector c = as follows, 
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[DUfw = 



1 

0 



pxl 



gxl 



(4.10) 



where 1^^, is a pxl column vector with ones in all components and is a qxl 
column constraint vector with all zeros in its components. It is worth noting that in 
(4.10), is used to constrain the desired target signatures in D performed by the CEM 
filter in (4.7), whereas is included to annihilate the undesired target signatures, 
in a similar fashion done by the oblique subspace projection (Scharf, 1991, 

Chang et al., 1998a-1998b). By taking advantage of (4.10) CEM can be extended to the 
following linearly constrained optimization problem 

minjw^R^^^ w} subject to [du]^w = 



(4.11) 



where the optimal weight vector can be solved by 



W™ = R-;.Jdu]([du7r-,‘ JDU]) 



1 

0 



pxl 






(4.12) 



The filter specified by the weight vector given by in (4.12) is called TCIMF and 

can be implemented as a detector, by 



■5Tc,K®.(r) = (w 



TCIMF 



(4.13) 



4.3 RELATIONSHIP AMONG OSP, CEM AND TCIMF 

In this section, we present an interesting finding (Chang, 2002a). If we compare 
5„3p(r) = d" P^r specified by (3.9) against <5^„(r) = (d^'R'J^pd) ‘ d’'R’|_pr specified by 
(4.9), we will discover that there is a close relationship between used in <5 q3p(f) and 
used in i5^pj^(r). They both operate a functional form as a matched filter specified 
by (3.7) with the same matched signal d but different scale constant k. Interestingly, it 
has been an oversight of significance in the constant K. In the matched filter derivation of 
(3.7), the constant k resulting from Schwarz's inequality generally plays no role in signal 
detection and is usually assumed to be 1 without loss of signal detectability, e.g. 
^Q 3 p(r). This is not the case for abundance estimation. As noted in Settle (1996), the 
constant k determines the estimation error of the abundance vector a. So, the 
relationship between (5Qsp(r) and <5^gj^(r) can be described from two viewpoints, signal 
detection and signal estimation. 
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In signal detection, the knowledge of which is assumed to be known in 
is not available in estimate directly from the image data. 

One way of doing so is to approximate the “ P^ ” in the sense of minimum least squares 
error by the sample spectral information, that can be obtained by the inverse of the 
sample correlation matrix from the image data. More specifically, makes use of 

the a posteriori information, to approximate the a priori information P^ to 

accomplish what <5Qgp(r) does. Since <5^sp(r) assumes the prior knowledge of the 
abundance vector a, there is no need of abundance estimation, in which case, k = 1. 

Mathematically, we can further demonstrate that both ^ 

special versions of where (^^gpCr) can be shown to be related to <5 cem(^') 

through <5pciMp(r). On one hand, if the knowledge of U is absent in the in (4.12), 

U = 0, that is the empty set, then becomes other hand, if we 

assume that does not use spectral correlation provided by data samples, 

^LxL “ ^LxL’ which case, the is denoted by . In order to relate to 

^Q^p(r), let D = d and U = [m^ m, m^_,] . Then %=l and n^=p-l and 
c = in (4.10) becomes c = Using a matrix inverse formula such 

as the equalities in Settle (1996), we can prove that ^^^jj^(r) turns out to be <5Qgp(r) . 



TCIMF T 

RLxL=UxE 



d’^d 


d^U 


y 


"i 


U^d 


u"u_ 


J 


_^(p-l)xl_ 



:jdu]([du]Tl[du]) 

= [dU]| 

=[dU] 

= [dU] 



K KTd^U'f 

-JcU’d (U"U)“‘ + K- U*dd" (U* )" J 



(4.14) 



K 

-xU'd 



: X d - X UU'd = k{i - UU" )d - X- P^^d 



where x = (d^R^J^^d) ^ and =(U^U)"'U^ is the pseudo-inverse of U defined in 
(3.3), and 

('-rr....)'' 

Equation (4.15) implies that when no sample spectral correlation is considered, 
is actually a posteriori OSP, which is identical to (5Qggp(r) or <^gml(t) specified by 
(8.24) and (8.30) discussed in Chapter 8. 
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On the other hand, (4.14) suggests that ^osp(^) considered as a whitened 

with the image data zero-mean whitened, i.e. However, it should 

bear in mind that there is a significant difference in filter design between 

The ^osp^®") derived by a linear mixture signal model specified by (3.2) 
where the noise is assumed to be additive, but not necessarily Gaussian assumed in the 
maximum likelihood estimation. By contrast, <5^^jj^(r) minimizes the filter output 
energy resulting from signals other than d and U without making an assumption of 
signal detection model, in which case 5^^jj^(r) suppresses noise and all signal sources 
except d and U. This explains why there is an additional constant K in (4.15), which is 
a consequence of interference and noise suppression resulting from <5.j,^i^^(r) . 

If we assume that a detector is a linear filter specified by a filter vector w, a detector 
can be expressed by an inner product of a filter vector w with an image pixel vector r. 
Namely, <5^(r) = w^r where the filter vector w is specified by a particular detector or 
classifier. Table 4.1 summarizes the relationship among ^Q 3 p(r), 



Table 4.1. Relationship among ^^gpCr), 



K 


matched 

signatures 


annihilated 

signatures 


filter vector w 


spectral 

correlation 




d 


U 


(3.9) 


IlxL 




d 


none 


iWl (4-8) 


^LxL 




D 


U 


([BufRlUDu]) '[i;„ o;„] 
(4.12) 


^LxL 



4.4 A COMPARATIVE ANALYSIS BETWEEN CEM AND TCIMF 

This section conducts a comparative analysis between CEM and TCIMF experiments 
to demonstrate their relative performance using a series of computer simulations and real 
hyperspectral data. 

4.4.1 Computer Simulations 

The AVIRIS laboratory data set shown in Fig. 1.5 and considered in Chapter 2 will 
be used for performance evaluation. Three signatures, blackbrush, creosote leaves and 
sagebrush were used as targets of interest, red soil as a background signature and dry 
grass as an interferer. A set of 300 mixed pixels were simulated. Each simulated pixel 
contained one background signature which is red soil with abundance fixed at 5% and 
one interferer which is dry grass with abundance fixed at 5%. In addition, each pixel also 
contained two undesired target signatures creosote leaves and sagebrush with even split 
abundance, 45%. Now at pixels 50, 100, 150, 200, 250, 300 we added the desired target 
signature, blackbrush with abundance 5%, 10%, 20%, 40%, 60%, 80% respectively 
while evenly reducing the abundance fractions of the two undesired target signatures, 
creosote leaves and sagebrush. For example, at pixel 50, it contained 5% blackbrush as 
the desired target signature, 42.5% creosote leaves and 42.5% sagebrush as undesired 
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target signatures, 5% dry grass as an interferer and 5% red soil as a background signature. 
The abundance fractions of five signatures in these 300 simulated mixed pixels were 
shown in Fig. 4.1. 




Blackbnisli Creosotj 1 saves Sajebivsh Red soil Dry grass 



(a) blackbrush (b) creosote leaves (c) sagebrush (d) red soil (e) dry grass 

Figure 4.1. 300 simulated mixed pixels 

In addition, a white Gaussian noise was added to each pixel to achieve 30:1 signal- 
to-noise ratio (SNR) which was defined as 50% reflectance divided by the standard 
deviation of the noise (Harsanyi and Chang, 1994). Using blackbrush as the desired 
signature d, three variants of TCIMF were implemented depending upon the how the 
undesired signature matrix U was selected. One used the two undesired signatures, 
creosote leaves and sagebrush to make up the undesired signature matrix U with the 

constraint vector chosen to be (l,0,0)^ while the other two used only one undesired 

signature to form U with the constraint vector given by (l,o)^. Fig. 4.2(a-c) show their 
respective detection results and Fig. 4.2(d) is the detection results of CEM filter with 
blackbrush as the desired signature d while discarding creosote leaves and sagebrush. As 
we can see from Fig. 4.2(a), TCIMF detected target pixels 150 (barely detected), 200, 
250, 300 but missed the target pixels 50, 100. The performance of TCIMF was slightly 
degraded shown in Figs. 4.2(b) and 4.2(c) when only one signature used for U. The 
performance of CEM in Fig. 4.2(d) was the worst because it missed all of the six target 
pixels. This experiment demonstrated the advantage of constraining the undesired target 
signature matrix U to zero over minimizing the energy contributed from U. 




(c) U = [sagebrush] (d) CEM filter 

Figure 4.2. Detection results of TCIMF and CEM filter with d = blackbrush 



Similar computer simulations to Fig. 4.1 were also conducted with using creosote 
leaves as the desired target signature and U = [blackbrush, sagebrush] as the undesired 
targets. The detection results are shown in Fig. 4.3. 
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(c) U = [sagebrush] (d) CEM filter 

Figure 4.3. Detection results of TCIMF and CEM filter with d = creosote leaves 

Interestingly, in this case, the detection performance of TCIMF was only slightly 
better than CEM and TCIMF with U using only one undesired target signature whose 
detection performance was nearly the same as shown in Fig. 4.3(b-d) where CEM 
detected creosote leaves at pixels 200, 250 and 300. However, when the same 
simulations were conducted using sagebrush as the desired target signature and U = 
[blackbrush, creosote leaves] as the undesired target signatures, the detection results 
shown in Fig. 4.4 are very different from those in Figs. 4.2 and 4.3. In this case, the 
TCIMF with U =- [blackbrush, creosote leaves] and the TCIMF using U = [creosote 
leaves] performed nearly the same by extracting sagebrush at pixels 200, 250 and 300. 
They both performed significantly better than CEM and the TCIMF using only 
blackbrush as the undesired target signature whose detection performance missed all six 
targets except for the case of Fig. 4.4(b) where sagebrush was barely detected at pixel 300 
by the TCIMF with U = [blackbrush]. 




(a) U = [blackbrush, creosote leaves] (b) U = [blackbrush] 




(c) U = [creosote leaves] (d) CEM filter 

Figure 4.4. Detection results of TCIMF and CEM filter with d = sagebrush 

All these phenomena can be explained by their spectral similarity values of the five 
targets in Fig. 1.5 tabulated in Tables 2. 3-2. 6. For example, Table 2.5 (Table 2.6) shows 
that blackbrush and sagebrush had most similar spectral signatures with its SDD-measured 
spectral similarity value as small as 0.0063 (0.2263 measured by HMMID). Compared to 
the SID-measured spectral similarity values, 0.4970 (1.5390 measured by HMMID) 
between blackbrush and creosote leaves, and 0.0303 (1.4102 measured by HMMID) 
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between creosote leaves and sagebrush, the SID-measured spectral similarity value 
between blackbrush and sagebrush is so small that it was very difficult to differentiate 
one from another. Therefore, if one was used as the desired target signature while the 
other as the undesired target signature, most spectral abundance of the desired target 
signature was eliminated by TCIMF or minimized by CEM. As a result, the detection 
performance was significantly reduced as shown in Fig. 4.2(c-d) as well as Fig. 4.4(b,d). 

4.4.2 Hyperspectral Image Experiments 

In this section, the 15-panel HYDICE scene in Fig. 1.7 was used for experiments 
where the five panel signatures in Fig. 1.8 were used for target information. When CEM 
was implemented, only the panels of interest was designated as the desired target while 
discarding all the information of other target panels. By contrast, TCIMF used the target 
knowledge to pass the desired target panels but also eliminate undesired target panels. 
The detection results of TCIMF and CEM are shown in Figs. 4.5 and 4.6 respectively. 
According to Tables 2.13-2.16 the spectral signatures of panels in rows 4 and 5 are very 
similar where their spectral similarity value measured by SID was 0.0017. As expected, 
CEM detected panels in one row also picked up a few panel pixels in another row. A 
similar phenomenon was also observed in the detection of panels in rows 2 and 3 because 
the spectral signatures of panels in rows 2 and 3 are very similar with their SID-measured 
spectral similarity value, 0.0023. By contrast, TCIMF performed better than CEM by 
nulling undesired panel pixels. Despite that the SID-measured similarity value between 
PI and P2, 0.0027 is also small compared to 0.0023 between P2 and P3, the SID- 
measured similarity value between PI and P3, 0.0060 is almost three times the SID- 
measured similarity value between PI and P2. In addition, the spectral shape of PI is not 
as close as the spectral shapes among PI and P2. As a result, both TCIMF and CEM 
performed equally well in detecting panel pixels in row 1 . 




This experiment demonstrates that both SID-similarity values and geometric shapes 
of spectral signatures have significant impacts on target detection performance. 

Since Fig. 1.7(b) provides the ground truth map of the 15 panels, we can actually 
compare the performance of CEM and TCIMF to that of OSP and NCLS detectors in 
Chapter 3 by tallying how many of red panel pixels are detected correctly. Here we did 
not include SCLS due to the fact that it generally did not perform well compared to 
NCLS as shown in Chapter 3. It should be noted that the images generated by all these 
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four detectors are abundance fractional images, which are gray scale. In this case, we need 
a thresholding method to determine if a pixel is a panel pixel. The abundance percentage 
cut-off method proposed in Ren (1998) was used to threshold abundance fractional 
images into binary images. It will be defined in (9.3) and discussed in detail in Chapter 
9. If the abundance fraction of the desired target signature present in a pixel is above a 
certain percentage, the pixel will be declared to be a target pixel and assigned to 1. 
Otherwise, the pixel will be considered to be a background pixel and assigned to 0. 
Tables 4.2-4. 5 are detection results obtained by using 50%, 25%, 20% and 10% 
abundance as cut-off threshold values respectively. In these tables, N is total number of 
panel pixels, Nj^ is the number of red panel pixels, is the number of red panel pixels 
detected and classified correctly, is the number of false alarm pixels and is 
detection rate defined by R^ = 



Table 4,2. Detection results of OSP, NCLS, CEM, and TCIMF using 50% cut-off abundance 





Nr 




OSP 






NCLS 






CEM 






TCIMF 






Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


PI 


3 


1 


0.3333 


6 


1 


0.3333 


2 


2 


0.6667 


0 


2 


0.6667 


0 


P2 


4 


2 


0.5000 


338 


3 


0.7500 


2 


3 


0.7500 


0 


3 


0.7500 


0 


P3 


4 


3 


0.7500 


227 


3 


0.7500 


3 


3 


0.7500 


0 


3 


0.7500 


0 


P4 


4 


0 


0.0000 


443 


3 


0.7500 


2 


3 


0.7500 


0 


3 


0.7500 


0 


P5 


4 


3 


0.7500 


0 


3 


0.7500 


2 


3 


0.7500 


0 


3 


0.7500 


0 


Total 


19 


9 


0.4737 


1014 


13 


0.6842 


11 


14 


0.7368 


0 


14 


0.7368 


0 




Table 4.3. 


Detection results of OSP, NCLS, CEM, and TCIMF using 


25% cut-off abundance 






Nr 




OSP 






NCLS 






CEM 






TCIMF 






Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


PI 


3 


2 


0.6667 


35 


2 


0.6667 


6 


2 


0.6667 


0 


2 


0.6667 


0 


P2 


4 


2 


0.5000 


1572 


3 


0.7500 


3 


4 


1.0000 


1 


4 


1.0000 


0 


P3 


4 


4 


1.0000 


712 


4 


1.0000 


4 


4 


1.0000 


0 


4 


i.oodo 


0 


P4 


4 


4 


1.0000 


2046 


3 


0.7500 


5 


4 


1.0000 


0 


4 


1.0000 


0 


P5 


4 


3 


0.7500 


13 


3 


0.7500 


5 


3 


0.7500 


0 


3 


0.7500 


0 


Total 


19 


17 


0.8947 


4378 


15 


0.7895 


23 


17 


0.8947 


1 


17 


0.8947 


QjjH 




Table 4.4. Detection results of OSP, NCLS, CEM, and TCIMF using 


20% cut-off abundance 










OSP 






NCLS 






CEM 






TCIMF 






Nb ■ 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


PI 


3 


3 


0.6667 


51 


2 


0.6667 


7 


3 


1.0000 


0 


3 


1.0000 


1 


P2 


4 


4 


1.0000 


1879 


4 


1.0000 


3 


4 


1.0000 


2 


4 


1.0000 


0 


P3 


4 


4 


1.0000 


774 


4 


1.0000 


5 


4 


1.0000 


0 


4 


1.0000 


0 


P4 


4 


4 


1.0000 


2468 


4 


1.0000 


5 


4 


1.0000 


0 


4 


1.0000 


0 


P5 


4 


3 


0.7500 


21 


3 


0.7500 


6 


4 


1.0000 


0 


4 


1.0000 


0 


Total 


19 


18 


0.8947 


5193 


17 


0.8947 


26 


19 




2 


19 




1 




Table 4.5. 


Detection results of OSP, NCLS, CEM, and TCIMF using 10% cut-off abundance 










OSP 






NCLS 






CEM 






TCIMF 






Nb - 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


Nrd 


Rd 


Nfa 


PI 


3 


3 


1.0000 


133 


3 


1.0000 


11 


3 


1 .0000 


0 


3 


1.0000 


0 


P2 


4 


4 


1.0000 


2524 


4 


1.0000 


14 


4 


1.0000 


5 


4 


1.0000 


3 


P3 


4 


4 


1.0000 


944 


4 


1.0000 


12 


4 


1 .0000 


2 


4 


1.0000 


0 


P4 


4 


4 


1,0000 


3393 


4 


1,0000 


9 


4 


1 .0000 


3 


4 


1.0000 


0 


P5 


4 


3 


0.7500 


59 


4 


1.0000 


14 


4 


1.0000 


3 


4 


1.0000 


0 


Total 


19 


18 


0.9474 


7053 


19 


1.0000 


60 


19 


1.0000 


13 


19 


1.0000 


3 
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As shown in these tables, CEM and TCIMF performed significantly better than did 
OSP and NCLS. As the cut-off abundance percentage was decreased from 50% to 10%, 
the detection rates of all the four detectors were improved at the expense of an increase of 
false alarm pixels. But the TCIMF was the best in the sense that it produced the smallest 
number of false alarmed pixels. In particular, when the 20% abundance was used as a cut- 
off threshold value, both CEM and the TCIMF already reached 100% detection rate with 
only one or two false alarm pixels, while OSP and NCLS achieved only the detection 
rate 0.8947 with many false alarmed pixels. 

In order to see how TCIMF performed with using different panels as undesired target 
signatures. Fig. 4.7(a-c) shows the results of detecting panels in row 5 with U made up 
of panels in row 4, panels in rows 3 and 4, and panels in rows 2-4 rows respectively. 




The results shown in Figs. 4.7(a-c) were very close. The reason for that is because all 
the used U contained the panels in row 4 which removed their interfering effects to 
enhance the detection of the panels in row 5. Whether or not the panels in other rows 
were included in the U had very little impact on the detection performance since their 
spectral signatures were different from those of panels in row 5. 

However, if we removed the panels in row 4 from U in the experiment of Fig. 4.7, 
the detection results of the panels in row 5 are shown in Fig. 4.8(a-c), where Fig. 4.8(a) 
was precisely the CEM result in shown in Fig. 4.6. As we can see, the panels in row 4 
were still barely visible, but could not be nulled out as was done in Fig. 4.5 and Figs. 
4.7(a-c) by TCIMF that included the panels in row 4 in the U. 







4.5 SENSITIVITY OF CEM AND TCIMF TO LEVEL OF TARGET 
INFORMATION 

As noted in Section 4.3, approximates by However, it may not 

be a best approximation in terms of information approximation. This is because the 
sample correlation matrix R^^^. includes the desired signature d, which is not included 
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So, abetter approximation can be achieved by replacing with which 

excludes all the pixels specified by the desired target signature d . More specifically, let 
A(d) denote the set of pixels in the image data that are specified by d. Then 

Rlxl I l) ] whcre I A(d) | is the number of target 

pixels in A(d). As will be demonstrated in computer simulations, using in place of 
R^^^ can improve the performance of It is also true for the case that removing 

target pixels from R^^^ whose signatures are close and similar to the desired target 
signatures can further enhance the performance as well. This was also demonstrated in 
Thai and Healey (2002) where the background subspace was generated by removing those 
signatures, which are close to target signatures so that background subspace can be 
effectively annihilated by an orthogonal projector. Similar conclusions can be also drawn 
for the LCMV-based target detector and TCIMF. In many applications, identifying target 
pixels d may be difficult. Under such circumstance, using sample correlation matrix 
R^^^ instead of R^^^ may be a simple way to do. Nevertheless, we will show in the 
following computer simulations that R^^^ can indeed improve detection performance of 
CEM and TCIMF. 

4.5.1 Computer Simulations 

The used data were three field reflectance spectra, creosote leaves, dry grass and red 

r 400 

soil are shown in Fig. 1.5. We simulate 400 mixed pixel vectors, |r.|. ^ , as follows. 

We start the first pixel vector with 100% red soil and 0% dry grass, then began to 
increase 0.25% dry grass and decrease 0.25% red soil every pixel vector until the 400th 
pixel vector which contained 100% dry grass. We then added creosote leaves to pixel 
vector numbers 198-202 at abundance fractions 10% while reducing the abundance of red 
soil and dry grass evenly. For example, after addition of creosote leaves, the resulting 
pixel vector 200 contained 10% creosote leaves, 45% red soil and 45% dry grass. White 
Gaussian noise is also added to each pixel vector to achieve a 30:1 signal-to-noise ratio 
as was defined as 50% reflectance divided by the standard deviation of the noise 
(Harsanyi and Chang, 1994). Fig. 4.9 shows these 400 simulated mixed pixel vectors. 



Abundance at Band 30 Average Abundance ol 153 Bands 




(a) Abundance of each pixel at band 30 (b) Abundance of each pixel averaged over 158 bands 

Figure 4.9. 400 simulated mixed pixel vectors 

Using these 400 simulated pixel vectors in Fig. 4.9 as data we can closely examine 
the performance of using R^^^ and R^^^ respectively. Fig. 4.10(a-b) shows the 
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detection results of creosote leaves where the former detected 0.07 of abundance fraction 
compared to 0.1 of abundance fraction detected by the latter. 




(a) using R,^^ (b) using R^^^ 

Figure 4.10. Results of using R^^^ and R^^^ respectively in detection of creosote leaves 

Apparently, using performed better than using in terms 

of abundance fraction detection where R^^^ did not include the desired target pixels from 

r -I 202 

198-202, Similar results shown in Fig. 4.11 were also obtained by the 

using R^^^ in place of R^^^ where was implemented by letting d = 

creosote leaves and U = [grass, soil]. As we can see, using R^^^ 

performed better than their counterparts using The advantage of using R^^^ 

becomes more evident if the number of target pixels was increased and expanded from 5 
pixels to 15 pixels with jr 3 starting pixel number 193 to pixel number 207 in Figs. 

4.12-4.13 and 25 pixels with |r j._,gg starting pixel number 188 to pixel number 212 in 
Figs. 4.14-4.15. 




Figure 4.11. Results of using R^^^ and R^^^ respectively in detection of creosote leaves 

Figs. 4.12-4.13 show the results of using R^^^ over their 

counterparts using R^^^ was more visible. The detected abundance fractions of the 
creosote leaves using R^^^ were more accurate that those detected by using R^,.^- 
Additionally, and using R^^^ also suppressed the effects caused by background 

signatures (dry grass and soil) more effectively than did ^cem(^) using 
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(^) R^^j, (b) using R^^^ 

r 1-°^ 

Figure 4.12. Detection results of 15 creosote leaves pixels, ^r. j starting pixel number 193 to pixel 
number 207. (a) 5^j^(r) using (b) using R^^^ 




(b) using R,, 



f 

Figure 4.13. Detection results of 15 creosote leaves pixels, |r-j, starting pixel number 193 to pixel 



number 207. (a) using R^^^; (b) <5.j,^u^(r) using R^, 



Similar experiments to Figs. 4.12-4.13 were also conducted for the case that the 
number of target pixels was increased from 15 to 25. The results are shown in Figs. 
4.14-4.15. As we can see from these figures, the performance of and 

using was degraded due to more interfering effects caused by additional target 

pixels, while the performance of using remained nearly the 

same. 




(“) “Sing (b) 5^^(r) using R^^^ 

Figure 4.14. Detection results of 25 creosote leaves pixels, {r, ]■ starling pixel number 188 to pixel 
number 212. (a) using R^^^; (b) 5^^(r) using R,^^^ 
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(b) 

212 



^TciMF^*’^ using R^^^ 

Figure 4.15. Detection results of 25 creosote leaves pixels, 

number212. (a) “^ing (b) “^”8 ®ixt 



(r) using 

starting pixel number 188 to pixel 



4.5.2 Hyperspectral Image Experiments 

In Section 4.4, we have seen that CEM and TCIMF worked very effectively for the 
15-panel HYDYCE scene. This was because the HYDICE image has very fine spatial 
resolution (1.5 m) and the used target knowledge was accurate to represent panel 
signatures. However, if the target information does not well represent the targets of 
interest, CEM and TCIMF may fail to detect those target pixels, which also belong to 
the same target class. In other words, CEM and TCIMF are very sensitive to the level of 
target information that is used in their filter design. Additionally, they are also very 
sensitive to noise, which is another issue that will be addressed in Chapter 7. 

In order to illustrate the sensitivity of CEM to target knowledge, two experiments 
were conducted using the AVIRIS image in Fig. 1.6. Since the AVIRIS image has 20-m 
spatial resolution and covers a very large area of LCVF, most of image pixels are likely 
mixed. In this case, the target information directly obtained from a few pixels in the 
image scene may not well represent the targets of interest. So, the first experiments used 
a single target pixel as the target information for target detection marked by white circles 
in Fig. 4.16. Fig. 4.16 shows the detection of five targets, cinders, playa, rhyolite, shade 
and vegetation. As we can from this experiment, the CEM could only detect the brightest 
pixel that was used for target information. 




A second experiment used the full-region target information to detect all five targets. 
The detection results are shown in Fig. 4.17. Compared to Fig. 10.16, the detection in 
Fig. 4.17 was significantly improved. In particular, the whole region of the dry lake was 
extracted and the anomaly marked by a white circle in Fig. 3.1(c) was also shown by a 
white circle. If we further compare the results in Fig. 4.17 to that in Fig. 3.1 obtained by 
OSP, the detection results produced by Fig. 4.17 was significantly improved. 
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For TCIMF, similar sensitivity phenomena were also observed since the imprecise 
knowledge of undesired signatures did not effectively eliminate their interfering effects. 



4.6 REAL-TIME PROCESSING 



One of advantages of CEM and TCIMF is that the data matrix R^^^ in the optimal 
weights specified by (4.7) and (4.111) can be decomposed into a product of a unitary 
matrix Q and an upper triangular matrix R by either the Givens rotations or the 
Householder transform. Such a decomposition is called QR-decomposition (Golub and 
Van Loan, 1989; Haykin, 1986). By means of QR-decomposition there is no need of 

computing the inverse of R^^^, R^'^^ directly. 

Suppose that data processing is carried out line-by-line from left to right and top to 

M X N where M is the number of lines (rows) and N is the total number of pixel vectors 
in one line (i.e., the number of columns in the image). For each line t, we form a data 
matrix = [r^, ‘ 'T,;/] where r^. is the pixel vector being visited at time instance i in 

the Mh line. In this case, the R^^^ in (4.3) is replaced by the data correlation matrix of 
line t in the image, denoted by 

2, =(l/iV)[E,'l.rT;] = (l/;V)[xX"]. (4.16) 

Using a QR-decomposition the matrix, can be expressed by 

X, =QR,. (4.17) 



Here is a unitary matrix with ^ and R^ = oj^ is not necessarily of full 

is an upper triangular matrix 



rank where 0 is a zero vector. The R''^ 



Hi H= 

0 * 



0 ••• 0 

and * in is a non-zero element. From (4.12) the inverse of can be computed as 
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e;' = n{xxY = A^|(Rr"^)"[(Rr^^^)'] I (4.18) 

where the unitary matrix Q, being canceled out in (4.18) because = Q^. 
Substituting in (4.10) for R"'^^ in (4.10) yields 



(r;""’")'' (r;"’'’")" '|t|^t"|(r;"’'’")“‘ (r;""’")" '|t^ c.(4.i9) 



Since is an upper triangular matrix, so is . Therefore, (4.19) does not 

require computing As a result, it can be realized by systolic array implementation 
(Haykin, 1986) and implemented in real-time processing. For details of the systolic array 
implementation we refer to references Haykin (1986), Chang and Althouse (1991), Ward 
et al. (1986). 

Since CEM and TCIMF share the same detector structure, only CEM will be used to 
demonstrate its capability of real-time implementation. The HYDICE image scene in 
Fig. 1.7 was used for experiments where the five panel signatures, P1-P5 in Fig. 1.8 was 
used for target information in CEM. Figs. 4.8-4.12 show the line-by-line real-time 
processing of CEM in detecting panels in five different rows respectively where the line 
number underneath each figure is the line currently being examined. As data processing 
began, no desired panel pixel detected. The image looked like random noise as shown in 
the images labeled by (a) in Figs. 4.18-4.22. However, as soon as desired panel pixels 
were detected, the image background suddenly turned to darkness and only pixels 
detected by CEM are highlighted by brightness as shown in images labeled by (b-d) in 
Figs 4.18-4.22. As the process proceeded, the image background remained dark while 
CEM continuing to look for pixels that matched the desired panel signature as shown in 
images labeled by (e) in Figs. 4.18-4.22. Finally, the whole process completed when the 
last pixel in the image is examined as shown in images labeled by (f) in Figs. 4.18-4,22. 
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Several comments are noteworthy. 

1) When the sample correlation matrix, is formed by the entire image, it can only 
capture the global properties within the image. By contrast, if is formed by 
lines, the information is updated line-by-line. As a result, it can characterize local 
properties. So, its performance , is generally better than the one using global 
properties. 

2) The CEM implemented by the above line-by-line real-time processing only uses the 
information up to the line currently being processed. There is another real-time 
implementation, referred to as pixel-by-pixel real-time processing, which utilizes the 
information up to the pixel currently being examined. So, in both cases, they can be 
viewed as a causal processing. Since there is no significant difference between these 
two causal implementations for CEM, only the line-by-line real-time processing was 
demonstrated in this section. However, we will show in Section 6.4 that there is a 
noticeable difference in anomaly detection between the line-by-line real-time 
processing and pixel-by-pixel real-time processing. 

3) When the entire image is used for processing, N is the total number of pixel vectors 
in the image. If the data is processed line-by-line, N is the total number of pixel 
vectors up to the line currently being visited by CEM. In our HYDICE experiments, 
the scene has size of 64 x 64 and the number of pixels in each line is 64 < 210. In 
order to avoid singularity problem, CEM used the first four lines to start its real- 
time processing. 
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4.7 CONCLUSIONS 

A linearly constrained minimum variance-based subpixel approach is presented in 
this chapter, which takes advantage of the strengths of CEM while improving its 
weaknesses. It divides targets of interest into a class of desired targets and another class 
of undesired targets while interference is treated as a separate third class of signal source. 
With this 3-component signal source model, TCIMF is desired to achieve detection of 
desired targets, annihilation of undesired targets and minimization of interfering effects 
all together in one single operation. It improves CEM in the sense that the effects of 
undesired targets are minimized rather than being eliminated. Since CEM can detect one 
target at a time, only one single target signature is used for experiments to have a fair 
comparative analysis between TCIMF and CEM. However, in order that CEM be used as 
a classifier, CEM must be implemented multiple times to classify different targets. This 
issue was investigated in Chang (2002b) and will be discussed in Chapter 1 1 . Compared 
to CEM, TCIMF requires only one implementation to detect and classify multiple 
targets in the same manner that is carried out by LCMV (Chang and Ren, 1999). As seen 
in Fig. 1.7(a), without ground truth obtaining the information of targets of interest is 
extremely difficult. To cope with this problem, developing an unsupervised CEM and 
TCIMF without appealing for ground truth is highly desirable. Several unsupervised 
learning methods will be discussed in Chapter 5 (Chang et ah, 1998a; Brumbley and 
Chang, 1999; Chang and Du, 1999; Chang and Ren, 2000; Chang and Heinz, 2000). 
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AUTOMATIC SUBPIXEL DETECTION: 
UNSUPERVISED SUBPIXEL 
DETECTION 



The subpixel detection studied in Chapters 3 and 4 requires either full or partial 
target knowledge. In the target abundance-constrained subpixel detection, a linear mixture 
model is required where the complete knowledge of the target signature matrix M must 
be given a priori. In the target signature- constrained subpixel detection, there is no need 
of a linear mixture model, but the desired target signature d must be provided in 
advance. In many practical applications, such prior knowledge may not be available. One 
way to remedy this situation is to obtain target information directly from the image data 
in an unsupervised manner. This chapter considers unsupervised subpixel detection where 
three unsupervised learning algorithms, unsupervised vector quantization (UVQ) 
algorithm, unsupervised target generation process (UTGP) and unsupervised NCLS 
(UNCLS) algorithm are presented. These algorithms extract necessary target information 
directly from the image data for unsupervised subpixel detection when no prior target 
information is available. Such generated unsupervised target information is referred to as 
a posteriori target information and ean be also used to perform target classification as 
will be discussed in Chapter 13. One dilemma associated with these unsupervised 
algorithms is how much target information is sufficient for detection and classification. 
This is a challenging problem. On many occasions, it is closely related to the virtual 
dimensionality, which will be discussed in Chapter 17. 



5.1 INTRODUCTION 

The need of unsupervised subpixel detection arises from the fact that many unknown 
signal sources that may affect subpixel detection performance cannot be identified a 
priori. In particular, some of these unknown sources may be considered as interference to 
detection of the targets in which we are interested. In this case, eliminating these 
intereferers will certainly improve our desired target detection performance. In order to 
do so, we must find these interferers, which can be included as the undesired target 
signatures in the target signature matrix M. More precisely, a well-represented target 
signature matrix M must include all image endmembers that completely characterize the 
image data, which may include image background signatures. Apparently, finding such a 
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well representative set of image endmembers for M is rather difficult, specifically for 
hyperspectral imagery, where many subtle material substances may be present and cannot 
be known in advance. This chapter addresses this issue. It presents three unsupervised 
algorithms that can be used to generate necessary target information for M. The first 
algorithm is an unsupervised vector quantization (UVQ) algorithm that uses the well- 
known Linde-Buzo-Gray (LBG) algorithm (Linde et al., 1980) to produce a codebook 
with each codeword considered to be a target signature. Then a desired target signature 
matrix M can be formed by the codewords in the UVQ-generated codebook. A second 
algorithm is an unsupervised target generation process (UTGP) which uses a series of 
orthogonal projections to generate potential targets from the image data. These 
orthogonal projection-generated targets are then used to form a desired target signature 
matrix M. A third algorithm is an unsupervised least-squares error based algorithm which 
takes advantage of the NCLS algorithm developed in Chapter 3 to generate a set of target 
signatures to form a desired target signature matrix M. These three algorithms can be 
implemented in conjunction with supervised subpixel detection methods such as those 
discussed in Chapter 3 to achieve unsupervised subpixel detection. 



5.2 UNSUPERVISED VECTOR QUANTIZATION (UVQ)-BASED ALGORITHM 



One of most widely used unsupervised methods is the nearest neighbor rule (NNR)- 
based clustering. It has been shown that with an unlimited number of samples, the error 
rate generated by NNR is never worse than twice the error rate generated by supervised 
Bayesian rule (Duda and Hart, 1973, pp. 98-103). In this section, we describe an NNR- 
based unsupervised approach that is derived from LBG vector quantization algorithm 
(Linde et al., 1980). 

Suppose that the number of clusters is p and {x^} ^ is a set of data samples. The 

LBG algorithm, also known as the generalized Lloyd algorithm (Lloyd, 1982), is 
designed to find an optimal set of p clusters based on the A:-means NNR (Duda and Hart, 
1973, pp. 103-104). It is an iterative algorithm and can be briefly summarized as 
follows. 



The LBG algorithm starts with an initial codebook, code^°^ = f which can be 

L J j=\ 

generated by an algorithm developed by Kasvounides et al. (1994). Using the initial code 
code^°^ = partition the data set |x^} ^ into a set of p clusters according to 



the it-means NNR. Let denote the resulting y-th cluster. If a tie occurs, the data 

sample is assigned to the cluster with the lowest index. Then we calculate the center of 
each of new clusters by 



pj’' = e[x|X 6R]'^'] for \<j<p 



(5.1) 



where X is a random vector to represent the data sample set {x^} These new 
generated cluster centers will be again used to repartition the data set {x^} ^ 

by the A:-means NNR into a new set of p clusters, denoted by • Using (5.1) 
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again resulting in another new set of cluster centers, • The entire procedure will 

be repeated until a stopping rule is satisfied. Here, the minimum squared Euclidean 
distance between a sample and its nearest cluster center will be used as a measure to stop 
the procedure. Details of implementing the above procedure are given below. 

Unsupervised Vector Quantization (UVQ) Algorithm 

1. Initialization: code^°^ = is generated by Kasvounides et al.’s algorithm 

(Kasvounides et ah, 1994). Let denote the initial clusters resulting from 

I using the A:-means NNR. 

2. At the iteration k, calculate a new set of cluster centers from the previous clusters, 

f using the following equation, 

=E[x|XGRf'^^] (5.2) 

3. The new set of cluster centers will be used to repartition ^ to form 

new clusters using the ^-means NNR. In other words, we compute the 

minimum squared Euclidean distance between the data samples and their nearest 
cluster centers. 

4. The reclustering process will be terminated when the minimum squared Euclidean 
distance resulting from two consecutive iterations is not further reduced or below a 
prescribed error threshold. In this case, no more data samples are shuffled from one 
cluster to another. 

Note that the above UVQ algorithm was successfully applied to unsupervised target 
detection for hyperspectral images (Brumbley and Chang, 1999) where the UVQ 
algorithm was used to find unknown interferers in an image scene so that they can be 
eliminated prior to target detection. 



5.3 UNSUPERVISED TARGET GENERATION PROCESS (UTGP) 

As indicated in the introduction, identifying unknown target signatures is almost 
impossible and prohibitive in practice. Recently, an unsupervised OSP was developed by 
Ren and Chang (2000) for this purpose. The algorithm, referred to as unsupervised target 
generation process (UTGP) was designed to extract unidentified targets directly from an 
unknown image scene and can be briefly described as follows. 

There are two ways to implement UTGP depending upon how an initial target 
signature, denoted by is chosen. If there is a target, such as man-made objects 
(vehicles) in an image scene provided by partial knowledge or visual inspection, this 
target can be chosen for the initial target signature . If there is no prior information 
available about the image scene, we select a pixel vector with the maximum length (i.e., 
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a pixel with the highest intensity or the brightest pixel) as an initial target signature . 
In this case, may not necessarily correspond to a target of interest. It could be an 
interfering signature, such as rocks, a scratch resulting from a sensor, etc. But the 
purpose of selecting is only to initialize UTGP. 

After the initial target signature is chosen, we then employ an orthogonal subspace 
projector via (3.3) to project all image pixel vectors into, < >'^ the orthogonal 

complement space of < > . Then the maximum length of a pixel vector in < 

will be selected to be a first target signature denoted by . The differs from in the 
sense that was not generated from OSP, but chosen by partially knowledge or 

intelligently, such as a pixel with the highest intensity. To avoid selecting similar 
targets, both and must be as distinct as possible. In doing so, we select as the 

one with maximum length in the space < >^ . In this case, should have the most 

distinct features from t^in terms of orthogonal projection. Next, we need a stopping rule 
to determine if we would like to terminate UTGP. One criterion is to consider as the 
desired signature and tj as an undesired signature, then calculate the orthogonal 
projection correlation index (OPCI) defined by 

OPCI(t„,U,)=77, (5.3) 

where The OPCI is used to measure the orthogonal projection residual resulting 

from the projection of in the direction of If OPCI(to,tj) is sufficiently small, the 
process of generating additional target signatures will be terminated. At this stage, the 
task of UTGP is completed. If OPCI(tg,tj) is greater than a prescribed threshold e, we 

continue the above process to generate a second target by applying Pr^ . again to the 

hoti j 

original image. As a result, all image pixel vectors are projected into a space that is 
orthogonal to the linear space spanned by the target signatures and . As before, a 

pixel vector with the maximum length in the space will be selected as a 

second target signature denoted by . Once again, we calculate the OPCI between 
and U, 



»7, = OPCI(t„,U,) = tX,t„ (5.4) 

where =[t, t^]. If is less than 8, UTGP is terminated. Otherwise, the same 
procedure is repeated to find a third target signature, a fourth target signature, etc., until 
at the k-th. step, OPCI(tp,U^) is small enough and less than e. 

More precisely, let {tj , , ■ • • , t J be the k-ih target set generated by UTGP at the k-\h 
step and the matrix - [^i ] be a target signature matrix formed by the k target 




AUTOMATIC SUBPIXEL DETECTION: UNSUPERVISED SUBPIXEL DETECTION 



77 



signatures with as its first column vector, as its second column vector, • • • , as 
its the last column vector. The OPCI between and = [tj t, •••tj, is defined by 

r],=OPCI(t,,UJ = tX^t„. (5.5) 

where = I - and = (U[U^)"^U^. Using OPCI given by (5.5) as a stopping 
rule, UTGP can be summarized below. 

Unsupervised Target Generation Process (UTGP) 

1 . Initial condition: select a pixel vector with the maximum length as an initial target 
signature denoted by i.e., 

= arg{max[r''r]}. (5.6) 



Set /: 4- 1 and = 0 . 

2. Find the orthogonal projections of all image pixel vectors with respect to by using 

P^^ = (I - t^tj) where is the pseudo-inverse of t^. 

3. Select a first target signature in the space < >'^ , denoted by t, by finding 



tj = argjmaxj 






(5.7) 



4. Use (5.3) to calculate rj^ = tlP^t^ with Uj to see if < e, go to step 7. 
Otherwise, let k k and continue. 

5. Find the A:-th target signature generated at the ^-th step in the space, 

with Uj,_j = [tj j] which is orthogonal to the space linearly spanned by the 

initial target signature and the k -1st target signature set j}. 



t, = arg 



max 




(5.8) 



where Pr^ n is defined by (3.3) and [t^U^ 1 is a matrix made up of and 

U^_j = [t^ ta ~ [^1 ^2 target signature matrix generated at 

the ^-th step. 

6. Stopping rule: calculate rj^ specified by (5.5) and compare it to the 

prescribed threshold 8. If rj^> e, go to step 5. Otherwise, continue. 

7. At this point, UTGP will be terminated and will be the final target 

signature set for detection. 
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It should be noted that at each iteration from step 5 to step 6 UTGP generates and 
detects one target at a time. The detected targets may not be significant and could be 
something else such as natural background signatures, clutter or interfering signatures. 
This is a consequence of the unsupervised nature in the UTGP. 

One comment on OPCI is useful for implementation of UTGP. The OPCI only 
provides a guide to terminate UTGP. There is no optimal number of targets required for 
UTGP to generate. How many targets are needed to be generated by UTGP is determined 
by the prescribed error threshold e set for OPCI in step 6. It can be determined 
empirically. Another way to terminate UTGP is to preset the number of targets to 
generate. For example, we can estimate the virtual dimensionality (VD) using the 
techniques developed in Chapter 17. The value of VD can be used to terminate the 
algorithm. In this case, there is no need of calculating OPCI as a stopping criterion in 
step 6. 



5.4 UNSUPERVISED NCLS (UNCLS) ALGORITHM 

The unsupervised method presented in this section is based on the least squares error 
(LSE) criterion that minimizes the goodness of fit between a linear mixture model and 
estimated measurements by the NCLS algorithm described in Chapter 3. The resultant 
approach is referred to as the unsupervised NCLS (UNCLS) algorithm. 

Initially, we can select any arbitrary pixel vector as an initial target signature denoted 
by tg. However, like UTGP, a good choice may be a pixel vector with the maximum 
length that corresponds to the brightest pixel in the image scene. Then the NCLS 
algorithm is used to estimate abundance fraction of t^, denoted by for each pixel 

vector r in the scene and the LSE is further calculated between the image pixel vector r 
and its estimate i.e. 

LSE"”(r) = [(r - a™(r)t„)"(r - «"'(r)t„)]. (5.9) 

Here, r is included in the abundance fraction estimate to emphasize that the 

estimated abundance fraction is a function of the pixel vector r and varies with r. The 
pixel that yields the maximum LSE is then selected as a next target signature, denote by 
t, , obtained by 



maxLSE'“’(r) = LSE™(t,) = [(a">(t,)t„ -t,)"(a''>(t,)t„ -t.)]. (5.10) 



Because the LSE between and is the maximum, it can be expected that tj 

is most dissimilar to t^. In order to find a second target signature, the UNCLS 
algorithm estimates the abundance fractions of and tj contained in each pixel vector r 
in the image scene, denoted by al'\r) and Then the maximum LSE between 

all image pixel vectors r and the least squares linear mixture 

estimated by the NCLS algorithm. Once again the pixel vector that yields the maximum 




AUTOMATIC SUBPIXEL DETECTION: UNSUPERVISED SUBPIXEL DETECTION 



79 



LSE is selected as a second target signature denoted by t, . The same procedure of using 
the NCLS algorithm with M = [t^ t ^ ] is repeated until the resulting LSE is small 

enough and less than a prescribed error threshold. As noted in UTGP, if there is partial 
knowledge available a priori, it can be incorporated in the above process. For example, if 
we know nothing but the desired target signature d, the initial target pixel signature 

can be replaced by this d. If there is more than one known target in the image scene, we 
can select these target signatures as an initial target signature set and then follow the 
same procedure described above until the LSE meets a stopping criterion. The procedure 
outlined as above is called Unsupervised NCLS (UNCLS) algorithm which can be 
summarized as follows. 

Unsupervised NCLS (UNCLS) Algorithm 

1 . Initial condition: 

Select 8 to be a prescribed error threshold and let = argjmax [r^r]} where r is 
run over all image pixel vectors. Let k = 0 . 

2. Let k <r-k and apply the NCLS algorithm with the signature matrix 

M = [to to estimate the abundance fractions of to, t^, ao^^(r), 

3. Find the maximum least squares error defined by 

max^ LSE“‘‘’(r) = max^j^r - [E‘:,‘a®(r)t Jj (r - [S‘;;a®(r)tj])| (- 511 ) 

If LSE^*~‘^(r) < £ for all r, the algorithm stops; otherwise continue. 

4. Find t^ = argjmax^ LSE^^^'^(r)|. Go to step 2. 



It is worth noting that the superscript (A:) in is a counter to indicate the 

number of iterations. It starts with ^ = 1. The subscript j in oc^^\r) starts with j = l 
and is the index of the y-th target signature t. generated by UNCLS algorithm. The 
initial target is represented by to with y = 0. For example, is the abundance 

estimate of to in the first iteration given by (5.10). It should be also noted that as will 

be demonstrated in the experiments in Chapter 7, step 4 implemented in the UNCLS 
algorithm tries to locate pure pixel vectors first. If there is no such pixel vector, it then 
looks for a mixed pixel vector with the largest possible abundance fraction of any 
substance in the pixel vector. This implies that a mixed pixel vector with uniform 
mixture is less likely to be selected by UNCLS as a target signature. 

Another comment is worthwhile. The UNCLS algorithm is primarily designed for 
unsupervised subpixel detection. It can be modified by replacing the NCLS algorithm in 
step 2 with the fully constrained least squares (FCLS) algorithm in Chapter 10 to make 
it an unsupervised FCLS (UFCLS) algorithm which can be used for unsupervised 
subpixel quantification. 
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5.5 EXPERIMENTS 

The UYQ algorithm is derived from a source coding perspective. It makes use of 
NNR to group image pixels into a set of p clusters that will be used to form a p- 
codeword codebook with each codeword represented by a cluster center. UTGP and 
UNCLS algorithm are developed from concepts arising in statistical signal processing, 
orthogonal projection and least squares error respectively. In order to demonstrate the 
performance of these three algorithms, the AVIRIS image in Fig. 1.6(a) and the 15-panel 
HYDICE scene in Fig. 1.7(a) were used for evaluation. The only knowledge required for 
the unsupervised algorithms is how many target signatures,/?, should be generated before 
the algorithms are terminated. This is closely related to the virtual dimensionality (VD) 
which will be discussed in Chapter 17. In the following experiments, we used the values 
of VD in Table 17.2 for p, which are 4, 5, 8 for the AVIRIS image and 13,15 and 20 for 
the HYDICE image. After p target signatures were generated, their information was then 
used to perform OSP specified by (3.9). The UVQ, UTGP and UNCLS algorithm 
implemented in conjunction with OSP are referred to as UVQ-OSP, UTGP-OSP and 
UNCLS-OSP respectively. 

Example 5.5.1 (AVIRIS Experiments) 



Figs. 5. 1-5.3 show that the detection results obtained by OSP for the AVRIS image 
with /? = 4 using UVQ algorithm, UTGP and UNCLS algorithm to generate 4 target 
signatures. 
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As we can see from these figures, the target information produced by both UTGP 
and UNCLS algorithm was better representative than that produced by UVQ algorithm in 
terms of target detection. Nevertheless, none of the three algorithms could detect rhyolite. 

Similar experiments were also conducted for /> = 5 and 8 and their detection results 
are shown in Figs. 5. 4-5.9. 
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A slight improvement was observed in Fig. 5.4 for UVQ-OSP where rhyolite was 
detected along with cinders. Interestingly, UTGP-OSP and UNCLS-OSP detected the 
anomaly in Figs. 5. 5-5. 6 and 5. 8-5. 9 which was missed in Figs. 5. 2-5. 3 with p = 4. 
Unfortunately, both figures also missed the detection of rhyolite shown in Figs. 5. 5-5. 6 
with p = 5. However, as p increased to 8 all the three detectors extracted the rhyolite, but 
only UNCLS-OSP detected it in a separate image. On the whole, UNCLS-OSP 
performed better than did the other two. This is consistent with the results obtained in 
Chapter 3. When p is too small such as /? = 4, the generated target information may not 
be sufficient to represent the image scene. In this case, the target detection generally did 
not perform well regardless of which unsupervised algorithm is used to produce target 
information. On the other hand, if p is too large, the targets of same type that contain 
different amounts of abundance will be forced to split into more than one class. For 
example, the anomaly detected in Figs. 5. 5-5. 6 has size of two pixels. It was detected in 
Fig. 5.8(b,f) and Fig. 5.9(c,h) where these two pixels were detected in two separate 
images. According to our experiments, p = % seems to be an appropriate number which 
will be further justified by experiments in Chapter 15. 
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Example 5.5.2 (HYDICE Experiments) 

Same experiments in Example 5.5.1 were conducted for /? = 13, 15, 20 using the 15- 
panel HYDICE scene. As shown in Figs. 5.10-5.15 with p = 13, all the three detectors 
had difficulty with separating the panels in row 2 from the panels in row 3 as well as 
separating the panels in row 4 from the panels in row 5. This is because the panels in 
rows 2 and 3 were made of the same materials with different paints. So are the panels in 
rows 4 and 5. 
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In order for UYQ-OSP to effectively detect the 15 panels, 40 target signatures were 
required to generate necessary target information as shown in Fig. 5.19. Even in this 
case, no single separate image can discriminate the panels in row 2 from the panels in 
other rows. However, the detection of the 15 panels can be accomplished by combining 
these separate images. Interestingly, these results were also consistent with the results 
obtained by Brumbley and Chang (1999) where the proposed unsupervised vector 
quantization-based target subspace projection approach is essentially UVQ-OSP, even 
though a different HYDICE scene was used. 
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Similarly, it required 34 target signatures for UNCLS-OSP to effectively separate the 
15 panels as shown in Fig. 5.20. 




The detection results in Fig. 5.20 were significantly better than those obtained by 
UVQ-OSP and also very close to those obtained by UTGP-OSP using 20 target 
signatures in Fig. 5.17. In Example 7.9 of Chapter 7, we will see that when the UNCLS 
algorithm implemented in conjunction with NCLS in Chapter 3 for partial unsupervised 
subpixel detection, 34 target signatures are also required. The only difference between 
these two cases is the selection of the initial target signature. In UNCLS-OSP the initial 
target signatures was selected by UNCLS algorithm while the initial target signatures for 
UNCLS-UNCLS was selected by prior knowledge. Nevertheless, their results are very 
close. 

The above experiments demonstrate that UVQ-OSP was not as effective as UTGP- 
OSP and UNCLS-OSP in unsupervised subpixel detection. This is because the target 
information produced by UVQ is mainly based on sample correlation, i.e. nearest 
neighboring rule. It did not take advantage of spectral properties as did UTGP and the 
UNCLS algorithm which are primarily designed for spectral analysis. More studies on 
the UNCLS algorithm and UTGP can be found in Chapters 10 and 13 respectively. 



5.6 CONCLUSIONS 

The OSP studied in Chapter 3 requires complete target knowledge. Insufficient target 
knowledge generally results in degradation in detection performance. In order to remedy 
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this dilemma, this chapter presents three unsupervised algorithms, UVQ, UTGP and 
UNCLS algorithm that are developed for this purpose. They can be used to obtain 
additional target information directly from the image data and further improve detection 
performance. By means of these algorithms, supervised subpixel detection can be 
extended to unsupervised subpixel detection so that the necessary target knowledge can 
be generated directly from image data in an unsupervised fashion. However, this also 
requires the knowledge about the number of target signatures to generate, which must be 
provided a priori. This issue will be discussed in Chapter 17. In order to evaluate the 
effectiveness of these unsupervised algorithms, a comparative study is conducted through 
a series of experiments. The experimental results show that UTGP and UNCLS 
algorithm are more effective than UVQ. 
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AUTOMATIC SUBPIXEL DETECTION: 
ANOMALY DETECTION 



One type of automatic target detection, unsupervised subpixel detection has been 
considered in Chapter 5. This chapter considers another type of automatic target 
detection, anomaly detection. Unlike the unsupervised subpixel detection, which requires 
generating a posteriori target information, anomaly detection does not need any target 
information at all. The purpose of anomaly detection is to locate and search for targets 
which are generally unknown, but relatively small with low probabilities of occurrence in 
an image scene. These anomalous targets cannot be identified by prior knowledge. Two 
approaches are considered in this chapter, the RX algorithm developed by Reed and Yu 
(1990) and the low probability detection (LPD) proposed in Harsanyi's dissertation 
(1993), both of which implement a same form as does CEM described in Chapter 4. The 
only difference is the matched target signature used in their filter design. The matched 
target signature used in CEM is the desired target signature d, whereas RXD and LPD 
use the sample pixel vector r and the unity vector 1 as their respective matched 
signatures. Like CEM, anomaly detection can be also implemented in real time. 



6.1 INTRODUCTION 

Automatic subpixel detection can be considered in two aspects. One is to detect 
targets of interest in an unknown image scene from which the desired target knowledge 
can be generated directly from the scene in an unsupervised means. Another is to detect 
anomalous targets in a blind scene where the targets are generally unknown and small, 
but interesting. This type of targets generally has low probabilities of occurrence. The 
techniques developed for the former case are designed to find undesired or unwanted 
signatures and eliminate these target signatures subsequently before target detection takes 
place. So, a general approach is to produce a target signature matrix M that can well 
represent the entire image data including image background signatures. Because there is 
no sample spectral information considered, such subpixel target detection can be 
considered as pixel-level spectral target detection and was discussed in Chapter 5. On the 
contrary, the techniques developed for the latter case search unknown but interesting 
targets via suppression of image background. A commonly used method is to use the 
sample spectral correlation (or covariance) matrix for background suppression. Therefore, 
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such detection can be viewed as sample correlation-based spectral target detection. An 
example of the relationship between these two approaches was given in Section 4.3 in 
Chapter 4 where the relationship between used in OSP and used in CEM was 
explored. Since the knowledge of the undesired target signature matrix U assumed to be 
known in is not available in CEM, CEM must estimate P^ directly from the image 
data. One way to do so is to approximate the P^ in the sense of minimum least-squares 
error using the spectral correlation information provided by sample pixel vectors in the 
image data. If we compare P^r specified by (3.9) against 

(5cgj^(r) = d^R^’^^r specified by (4.8), we will discover that it is indeed the 

case, where that CEM uses the sample spectral information provided by R^^^ to estimate 
the spectral information U used in OSP. More specifically, the undesired target spectral 
projector P^ is approximated by its minimum least-squares error based estimate 

with K = ^d^R~'^^d) included to account for the estimation error (Chang, 1998). If the 
image data to be processed are assumed to be zero-mean and uncorrelated, the sample 
spectral correlation matrix R^^^ is reduced to the identity matrix and 

d^R^^^^d = d^d. As a result, CEM becomes a normalized spectral matched filter, that is, 
(5cEM(r) = hand, if no sample spectral correlation is taken into 

account, i.e. R^^^ = and the desired target signature d is orthogonal to U, i.e. 
P^^d = d, then S^^p{r) = d^P^^^r = r = d^r. This implies that CEM is identical to 

OSP by a normalization constant k - (d^d) . In other words, OSP can be considered a 

whitened version of CEM as illustrated in Section 4.3. However, as shown in Chang 
(1998), the constant K is the result from a least-squares estimation error of a. It is 

bounded below by (d^d) with (d^d) < K = (d>„"d)" < oo. The optimal result can 

be achieved by the greatest low bound, K = (d^d) . It makes sense since the sample 
spectral correlation matrix is an identity matrix and no portion of d will be projected 
onto < U via P ^ . Therefore, the information provided by d is not deteriorated; thus 
the matched filter achieves the best performance. In this case, both OSP and CEM wind 
up an identical detector (d^d) d^r. Such OSP is called a posteriori OSP in Chapter 8. 

Generally speaking, when there is no prior information available for subpixel target 
detection, the required target information such as the desired target signature d, target 
signature matrix M, must be estimated from the image data. This information may not 
be accurate due to the noise and measurement error. One way to avoid this dilemma is to 
design a detector, which does not need any target information. In this chapter, we address 
this issue and consider subpixel detection with no required prior information. The 
detector of this type can be designed by a matched filter that uses the currently being 
visited image pixel vector r as the matched signature. Two detectors are of interest. One 
is RXD where the matched signature is the sample image pixel vector r. Another is LPD 
with the constant unity vector 1 as its matched signature. As we will see, if the matched 
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signature r used in RXD and the unity vector 1 used in LPD are replaced with the desired 
target signature d, both resulting RXD and LPD become an non-normalized CEM. 



6.2 RXD 

The main purpose of the unsupervised algorithms studied in Chapter 5 is to find an 
appropriate U that can characterize the image background. The role of the initial target 

in these algorithms is simply used for initialization. How is chosen is generally not 
important and does not affect the ultimate performance. Therefore, these algorithms are 
not particularly designed for anomaly detection. An anomaly detector does not extract 
targets randomly. It extracts unknown targets, which are supposed to be interesting and 
meaningful, particularly spectrally distinct from their surroundings. In this section, we 
describe two types of detectors that can be used to detect anomalous targets. Both are 
very closely related to CEM discussed in Chapter 4. It is interesting to see that these 
three perform a same functional form of a matched filter with using different matched 
signatures. 

The RXD described in this chapter is referenced in Ashton and Schaum (1998) and 
Stellman et al. (2000) which was developed by Reed and Yu (1990). More specifically, 
for each image pixel vector r, RXD implements a filter specified by 

5^(r) = (r-At/K;‘,(r-M) (6.1) 

where ju is the sample mean and is the sample spectral covariance matrix. The 

5j^j^p(r) in (6.1) actually a measure that calculates the well-known Mahalanobis distance. 
It has a similar form to that of CEM specified by (4.8) where the sample spectral 
covariance matrix and the matched signal signature d are replaced with the sample 
spectral correlation matrix and the pixel vector v ~ }JL respectively, and the scale 

constant is discarded. Since there is no prior information required for RXD, 

it can be only used as an anomaly detector. Furthermore, as will be discussed in Chapter 
7, the scale constant appearing in CEM but not showing in RXD is a crucial factor on 
detection performance because it reflects the information to be used in a detector. In 
RXD, there is no information used; thus the scale constant K is missing in (6.1). 

Mathematically, RXD can be considered as an inverse operation of the principal 
components analysis (PC A). PC A decorrelates the data matrix in such a manner that 
different amounts of the image information can be preserved in separate component 
images, each of which represents a different piece of uncorrelated image information. So, 
the strength of PCA is to compress most of significant image information into a few 
major principal components specified by the engenvectors of that correspond to 

large eigenvalues. So, it is not designed for detection or classification. On the contrary, if 
the image data contain interesting target samples which only occur with low probabilities 
in the data (i.e. the size of target samples is small), these targets will not be shown in 
major principal components, but rather in minor components specified by small 
eigenvalues. This phenomenon was observed and demonstrated in Chang and Heinz 
(2000b). It provides explanation of why RXD works for anomaly detection. As we recall 

(6.1), RXD looks for anomalous targets by calculating (r - - //) . If there is 
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an anomaly in the data with relatively small energy, it will be specified by a small 
eigenvalue. In this case, a small eigenvalue will produce a large value of 

(r - juY - ju). This is equivalent to searching for minor components by finding 
smaller eigenvalues of As a consequence of this operation, RXD can be very 

effective in detecting small targets, which are retained in minor components. One 
problem with this approach is how to separate small eigenvalues from noise variance in 
the data. This is determined by the intrinsic dimesnionality of the data. It is a noise 
sensitivity issue, which will be discussed in Chapter 7. The explanation described above 
can be derived mathematically as follows using the spectral decomposition of a 
covariance matrix in Poor (1992), Stark and Woods (1991). 

Assume that >•■•> A^ are the eigenvalues of with zero mean, ^ = 0 

and are their associated orthogonal unit eigenvectors (i.e. the vector length 

is one) with v^ corresponding to A,. Let A = be an eigen-matrix with the /- 

th column specified by the /-th eigenvector v, . Then the matrix A is a unitary matrix and 
can be used to decorrelate into a diagonal matrix A = such that 

A^K^^^A A. If we let y = A^r, then 

= (Ay)^K,^,(Ay) = y^[A^K,^,A]y 

= yXxty = Sf.,A,yf (6.2) 



By means of (6.2) we can further represent RXD by 



T -rr-l V' L 0 “1 2 

r K, .r = Z,.,A, y, 



(6.3) 



According to (6.2), the larger the eigenvalue is, the greater the value of is. 

So, using (6.2) we are able to compress data by retaining only a few principal 
components specified by the first few largest eigenvalues. By contrast, (6.3) allows RXD 
to detect anomalous targets with small energies that are represented by small eigenvalues. 
This is because the smaller the eigenvalue is, the greater the value of is. In this 

case, it is crucial to determine p, the number of targets present in the image scene so that 
the eigenvalues beyond the first p largest values will be noise energies. Fig. 6.1(a-b) 
show the results of the LCVF in Fig. 1.6(a) produced by the first principal component of 
the standard PCA and (6.2) respectively, where Figs. 6.1(a) and 6.1(b) are comparable 
and both preserve vast information of the image scene. Fig. 6.1(c) was generated by RXD 
using (6.1) or (6.3) where RXD detected a two-pixel anomaly on the left upper edge of 
the dry lake along with a small portion of vegetation on upper right comer shown in Fig. 
6.1(c). 
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(a.)FCA .F. ir K \ FXn 



As noted in the introduction of this chapter, OSP, CEM and RXD operate some 
form of a matched filter, but their performance is closely related to the scale constant k 
that appears in front of their matched filters. With this interpretation two variants of 
RXD, referred to as normalized RXD, denoted by (5j^^(r) and modified RXD, denoted 
by derived from (6.1) for anomaly detection as follows. 



S 



NRXD 



(r) = 









(6.4) 



which can be viewed as RXD with the scale constant K = as well as 

a matched filter with the matched signature given by with the scale 

constant k: = j^(r - ^)^(r - /i)j . Or equivalently, 



<5™«D(r) = ([(«’-A‘r (*■-/“)] (r-p)) KX|[(r-/^)'(r-Ai)] (r-Ai)j 



(6.5) 



: r^K:‘ ,r 



with r = j^(r - /i)^(r - ^)j (r-^), which can be considered as normalized RXD 

(NRXD). 

Another variant of RXD is 



<5mrxd(«’) = [(*• - -“)'(>• - -“)] (•• - - -“) 



( 6 . 6 ) 



which can be considered as RXD with the scale constant K = [(r-/i)"(r-/i)J or 

interpreted as a matched filter with the matched signal given by with the scale 

constant K - I where r is the normalized image pixel vector of r - ju . 
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Fig. 6.2(a-b) shows the results of and Interestingly, instead of 

detecting the anomaly as RXD did, ^mrxd^^) extracted different signatures 

such as background pixels and edge pixels of the dry lake. 



6.3 LPTD AND UTD 



Another type of anomaly detection was previously derived in Harsanyi's dissertation 
(1993). It is called the low probability target detector (LPTD) and given by 






(6.7) 



which was based on the sample correlation matrix If we replace with the 

sample covariance matrix we can define an alternative LPTD using sample 

covariance matrix referred to as uniform target detector (UTD), which is 

given by 



= (6.8) 

where 1^^^ = (1,1,- ‘sl)^ is the unity vector with ones in all the components. The UTD 

L 

also uses the unity vector 1 as its matched signature. The reason of choosing the unity 
vector is the following. Because there is no prior information available, the best scenario 
is not to introduce any information into the detector. In this case, the anomalous targets 
are assumed to have radiance uniformly distributed over all the spectral bands. As a 
consequence , background signatures will be extracted as anomalies. However, if there is 
some partial knowledge available, the unity vector can be replaced by a certain specific 
vector. For example, if we are interested in short wavelengths, we can set O's for all 
visible bands while assigning ones to all the near infrared bands. Fig. 6.3(a) shows the 
result of the LCVF in Fig. 3.1 produced by UTD in which it did not detect the anomaly 
shown in Fig. 6.1(c); instead, it detected most of the image background. This is exactly 
what we expect from the unity vector since the background signatures can be assumed to 
be uniformly distributed. 
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(b) 



It is interesting to note that the background subtraction could enhance RXD 
detection performance as shown in Ashton and Schaum (1998). This suggests that we can 
incorporate UTD into RXD to remove the image background as well as noise to improve 
the performance of RXD. This advantage enables us to design a new type of anomaly 
detector. Instead of following Ashton and Schaum ’s approach which used the constrained 
linear mixture model to estimate the image background, we design an anomaly detector, 
denoted by by subtracting UTD from RXD as follows. 

5.xD-u™(r) = (r-irKr,,(r-/^). (6.9) 

Fig. 6.3(b) shows the result of ^rxd-utd^**) improvement by visual inspection. 

However, the magnitude of the anomaly was slightly improved. This is because the total 
energy of the background in LC VF scene is less than that of anomaly. In this case, the 
background subtraction using (6.8) did not improve much over the RXD. 

As an another example, the 15-panel HYDICE scene was also used for experiments. 
Fig. 6.4 shows the results produced by PCA and (6.2). In Fig. 6.4(a), PCA retained 
most of image information in its first component, which included the forest on the left 
edge and some panels. So, from an information preservation viewpoint, PCA achieved 
its goal in terms of information preservation. However, if we look at Fig. 6.4(b), 

(r~/i) specified by (6.2) extracted only the forest. Therefore, from a 

classification point of view, PCA did not perform well since it could not separate the 
forest from the panels in one component image. It is interesting to note that compared to 

PCA, successfully separated the forest in an individual component 

image. This implies that ~ 1 ^) performed better than PCA in terms of 

target detection. 
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Furthermore, Fig. 6.5 shows the detection results of the five anomaly detectors, 
■5nrxd(>’). <5mrxd(>')’ and <5^o.uTD(r) where S^{r) and 

performed nearly the same. They detected an interferer on the left upper comer and all the 
panels in the first column, but missed all the panels in the second and third columns as 
shown in Fig. 6.5(a,e) as opposed to that extracted the image background as 

expected in Fig. 6.5(d). Apparently, and did not detect any panel in 

the scene. 




The hyperspectral image experiments conducted in Figs. 6.2 and 6.5 may lead to a 
conclusion that NRXD, MRXD and UTD could only perform background extraction. As 
will be shown in the following experiments, it is generally not the case. This is due to 
the fact that the information used in the detectors was which had removed the 

information of the first-order statistics. If the spectral properties of target signatures can 
be only characterized by the first-order statistics, these targets may not be detected by a 
detector designed solely based on the second-order statistics. This is particularly tme for 
nonstationary images such as remotely sensed imagery. In order to account for both the 

first-order and second-order statistics, we substitute for and r for r - ju in 
(6.1), (6.4-6. 8). Figs. 6.6 and 6.7 show their detection results of the AVIRJS and 
HYDICE images respectively. In this case, UTD becomes LPTD. From these figures, we 
can see that the performance of RXD and RXD-UTD remained almost unchanged while 
UTD still detecting background and noise. To our surprise, NRXD and MRXD 
performed quite differently. Unlike Fig. 6.2, both NRXD and MRXD detected the shade 
in Fig. 6.6(b-c). In addition to the shade, MRXD also detected the anomaly in Fig. 
6.6(c). Most interestingly is Fig. 6.7(b-c). Compared to Fig. 6.5(b-c) where NRXD and 
MRXD extracted only image background. Fig. 6.7(b-c) shows that NRXD and MRXD 
detected panels that were also detected by RXD. Besides, they both also extracted some 
tree signatures and interferers. 
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These two experiments demonstrated that when images are not stationary and cannot 
be characterized by only second-order statistics, using the sample correlation matrix 

is more appropriate and effective than using the sample covariance matrix to 

capture image characteristics. 

6.4 RELATIONSHIP BETWEEN CEM AND RXD 

As discussed in Section 4.3, ^osp(**) <^cem(**) performed some sort of a matched 

filter with a different scale constant K appearing in front of the matched filter. In order to 
investigate the relationship between CEM and RXD, we are interested in an RXD, which 

implements r and R^^^ in place of (r-ju) and respectively. Such R^^^-based RXD 
is referred to as R-RXD denoted by^j^ j^^(r) afterwards. Unlike which requires 

the prior knowledge of the desired target d as the matched signal, does not 

have any prior target information and can be implemented with no prior knowledge. So, 
it is intuitive to choose the currently being processed image pixel r as its matched 
signature. Also noted, since anomaly detection primarily performs target detection not 
abundance estimation, the constant K in may not be a crucial parameter and can 

be set to k = 1 , So, if we replace the matched signature d in with r and discard 

the constant K = (d^R^J^^d) by setting /c = 1, becomes 

In analogy with (6.4) and (6.6), we can also define normalized R-NRXD, ^r.nrxd^*’) 
and modified R-RXD, as follows. 



,(r) = (r/||r||)^R4,(r/||r|l) = (||r|r)"rX> = (r/||r|r)^R;4r (6.10) 



AMRxi>(r) HKir r"R'4r = (r/||r||)" R;> (6.1 1) 

where ||r||= ViT is the norm (vector length) of r. 

Following the same treatment for NRXD and MRXD, specified by (6.10) 

can be interpreted in three different ways. One is viewed as a normalized version of R- 
RXD. Another interpretation for ^R.NRXD(r) is that it can be regarded as an R-RXD with 
the matched signature d = r and a different scale constant K =||rir\ Or equivalently. 
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^r-nrxd(^) thought of as a matched filter with the matched signature given by 

d = r/||r|f with k: = 1. Similarly, the specified by (6.11) can be interpreted 

as an R-RXD with the constant /c =|| r If* or a matched filter with the matched signature 
d = r/||r|| and K = 1. 

The (5^(r) and interpreted by a matched filter with the 

matched signature specified by an L x 1 dimensional unity vector 1 with k: = 1 . In 
analogy with we can set the constant K to 1 since and are 

only used for target detection. The difference between and 

that S^^(r) uses r as its matched signature, while using 1 as 

their matched signatures. Because they are anomaly detectors not abundance estimators, 
there is no need of using the constant k to account for estimation error. 

In order to conduct a quantitative study among these -based anomaly detectors, 
the data used in Section 4.5.1 were also used for computer simulations where 400 mixed 

r 1 400 

pixel vectors, were simulated and shown in Fig. 4.9. Fig. 6.8(a-d) shows 

anomaly detection of 5 creosote leaves pixels by the 

and ^LPTD(n) where (5„.^(r) and <5^.^(r) performed better than in 

detection of creosote leaves, while basically extracted most background pixels. 

Comparing to the results in Figs. 4.10 and 4.11, all the four anomaly detectors cannot 
compete with <5^Ef^(r) and in detection of creosote leaves. This is because the 

latter requires the specific prior knowledge to detect targets as opposed to an anomaly 
detector which does not have a priori knowledge, but looks for targets with signatures 
spectrally distinct from neighboring pixels. 







Pixeivectw * 

(d) 



*^R-NRXD^^^ ^R-MRXD^*"^ "^LPTD 

Figure 6.8. Detection results of (a) 5p^^Jr);(b) ^ (^) 



Like Table 4.1, we summarize the relationship among ^r.rxd(^)’ 

^R-NRXD^**)’ ^r-mrxd(**) Tablc 6.1 In terms of their matched signatures and 

their corresponding constants K. It should be noted that all the four sample correlation 
matrix R^^^-based anomaly detectors and R^‘^^ in their filter design. 
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Table 6.1. Relationship among and S^{r) 





K 


matched signatures 


filter vector w 






d 


d'RiL (4.8) 


■5r.rxd(0 


1 


r 


(6.1) 


A-nrxd 


1 


r/llrll' 




*^R-MRXD^^^ 


1 


r/||r|| 


(■■'IMlfRlU (6.11) 




1 


1 


i"r;L (6.7) 



6.5 REAL-TIME PROCESSING 

The RXD defined by (6.1) cannot be implemented in real-time because the 
computation of requires to calculate the sample mean of the entire image. On the 
other hand, it has been demonstrated in Chang et al. (2001) that the computation of the 
sample correlation matrix R^^^^ can be implemented by a real-time process via a QR- 
decomposition (Golub and Van Loan, 1989). Comparing Fig. 6.1(c) against Fig. 6.6(a) 
and Fig. 6.5(a) against Fig. 6.7(a), there was no visible and appreciable difference for 
RXD to use or Fig. 6.9(a-b) shows their respective differential images 

obtained by taking the absolute difference between Fig. 6.1(c) and 6.6(a) for the AVIRIS 
image and the absolute difference between Fig. 6.5(a) and 6.7(a) for the HYDICE image. 
The plots next to the images show their corresponding abundance. 




asKi b m . 

In Fig. 6.9(a), the only visible difference is the highest peak resulting from that both 
detectors detected different amounts of abundance in anomaly. In Fig. 6.9(b), both 
detectors detected nearly the same amount of abundance in all the panels, but different 
abundance fractions of background signatures, specifically, the one at the bottom right. 

As also demonstrated in Figs. 6.2, 6.6(b-c), and Figs. 6.5(b-c), 6.7(b-c), it 

suggested that R^^^ might be better than in characterizing spectral properties of 

non-natural targets. So, in order to implement RXD in real-time, it is advantageous to 
use RXD with replaced by i.e., R-RXD. The resulting R-RXD is referred to 
as causal R-RXD (CRXD). Since the information used in CRXD is updated in a causal 
manner, it may yield results different from that produced by R-RXD as demonstrated in 
the following experiments. 

Analogous to the line-by-line real-time processing implemented for CEM in Section 
4.6, we implemented the same line-by-line real-time processing for CRXD on the 
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AVIRIS image. The result is shown in Fig. 6.10 where the vegetation was first detected 
in the first 15 lines, then it disappeared as soon as the anomaly was detected at pixel 
180. Finally, when the process was completed, the vegetation eventually vanished while 
the anomaly was enhanced. 
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Compared to Fig. 6.5(a), it is interesting to observe that the panel pixels in row 1 
were first detected, then they faded away when an interferer was detected at the left upper 
comer. After this stage, their visibility was gradually diminished and eventually vanished 
when the process was completed. This experiment demonstrated a subtle difference 
between the line-by-line real-time processing of CRXD and RXD using where the 
former used the causal information up to the line at which the pixel being processed, 
while the latter used noncausal information provided by the entire image. 

Depending upon how the causal information used to calculate a real-time 

implementation can be carried out in two different ways. One is the line-by-line real-time 
processing as described above. It used the line-by-line causal information that was 
provided by the pixels from the first line up to the line currently being examined by 
CRXD. Another is a pixel-by-pixel real-time processing which uses the pixel-by-pixel 
causal information that is provided by all pixels up to the one currently being examined 
by CRXD. Since CEM used the prior knowledge of the desired target signature, these 
two real-time processes resulted in very little difference for CEM. However, it is not tme 
for CRXD because there is no assumed prior knowledge and the anomalies are generally 
sensitive to noise due to their small energies. Consequently, CRXD is sensitive to the 
information used in Fig. 6.12 shows the experiment of the pixel-by-pixel real 

processing of CRXD for the 15-panel HYDICE image. If we compare Fig. 6.11 to Fig. 
6.12, there were more background signatures detected in Fig. 6.11 than in Fig. 6.12, 
specifically, up to the detection of panels in row 3 . However, it was found that for the 
AVIRIS image there was no visible difference between the line-by-line processing and 
pixel-by-pixel processing. This may be due to the low 20-m spatial resolution of the 
AVIRIS image and the pixel-by-pixel real-time processing was only sensitive to high 
pixel resolution. The results are not included here. 
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As a final comment, two significant advantages of using real-time processing are 
computational efficiency and detection of moving targets. This is particularly useful for 
hyperspectral imagery because the data cube is generally very large. 



6.6 CONCLUSIONS 



Despite that the unsupervised subpixel detection considered in Chapter 5 required no 
prior target knowledge, it did make use of target information generated by an 
unsupervised algorithm to perform supervised subpixel detection. Such obtained target 
information can be viewed as unsupervised target knowledge. This chapter considers 
anomaly detection, which does not require unsupervised target knowledge. In surveillance 
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applications, we are often interested in anomalies that are relatively small objects with 
low probabilities in an image, but do not belong to the image background. These 
anomalous targets are generally unknown a priori and cannot be identified from an image 
scene visually. To meet this need, several anomaly detectors are developed to extract such 
targets. They all operate in a functional form of a matched filter with a different matched 
signature. Furthermore, they can be also implemented in a real-time process. 
Unfortunately, there is a disadvantage resulting from no prior knowledge. An anomaly 
detector cannot classify the targets it detects since it cannot discriminate its detected 
targets one from another. In order to extend anomaly detection to anomaly classification, 
it requires criteria for target discrimination. This issue will be addressed in Chapter 14. 
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SENSITIVITY OF SUBPIXEL DETECTION 



In Chapter 3, we considered the target abundance-constrained subpixel detection 
(TACSD) and evaluated three techniques, OSP, SCLS and NCLS. A major limitation of 
these techniques is the requirement of complete target knowledge. In Chapter 4, we 
studied the target signature-constrained subpixel detection (TSCSD) and also evaluated 
two techniques, CEM and TCIMF. However, their performance is sensitive to the target 
knowledge used in their filter design and is also closely related to the intrinsic 
dimensionality (ID) of which is generally unknown and needs to estimate. 

Unfortunately, the accuracy of the ID estimation is determined by noise level. Similarly, 
the anomaly detection developed in Chapter 6 also requires the computation of 
with their performance also affected by noise sensitivity. So, this chapter is devoted to a 
thorough investigation of two issues related to detection performance. One is the 

sensitivity of target knowledge. Another is the noise sensitivity to computation of R~‘^^ 
required for TSCSD. Because the noise variances are determined by eigenvalues, the 
performance of TSCSD will be evaluated based on the number of eigenvectors used to 
calculate As will be demonstrated, this number plays a significant role in detection 
performance. 



7.1 INTRODUCTION 

In Chapters 3 and 4, we investigated TACSD and TSCSD respectively. Several 
distinctions between these two approaches are worth noting. The most prominent is that 
TACSD is basically a pixel-level spectral analysis technique, which operates on a pixel- 
by-pixel basis. It takes advantage of inter-band spectral information provided by 
individual bands within a pixel vector. In the OSP approach, such inter-band spectral 

information is provided by the knowledge of d and . In the PCLS methods, the 

spectral information specified by in (3.10) is known. By contrast, TSCSD 

is a sample spectral correlation matrix-based analysis technique, which utilizes the 
sample spectral correlation matrix R^^^ or the sample spectral covariance matrix to 

account for spectral correlation provided by sample pixel vectors. This can be seen from 
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(4.6), (4.10), (6.1), (6.4)-(6.7) where the detectors must calculate Nevertheless, all 

these detectors result in a same functional form of a matched filter given by (3.6) with 
different matched signatures and scale constants K that account for abundance estimation 
error. When a spectral matched filter is used for TACSD, the inter-band spectral 

information specified by or (M^M)~‘M^ is used to preprocess a data vector by 
eliminating undesired signatures before a matched signature is applied. When a sample 
spectral correlation matrix-based matched filter is used for TSCSD, the sample spectral 
correlation/covariance matrix, or is used to whiten a data vector prior to 

application of a matched signature. It was shown in Chang (2002a) that OSP is a 
whitened version of CEM through TCIMF with an inclusion of a scale constant K that 
accounts for estimation error. This is because OSP does not estimate the abundance 
fraction vector a, but CEM does. On the other hand, it was also shown in Chang and 
Heinz (2000) that the detection performance of TSCSD was also determined by q, the 
number of eigenvectors used in the computation of This q was tied with the scale 
constant K. Therefore, with respect to information being used in a matched filter, 
TACSD and TSCSD can be viewed as completely different approaches in the sense that 

the former uses or (M^M)~^M^ for information preprocessing as opposed to the 

latter using R^^^^ or for whitening the data. However, there is a close link between 
these two. As discussed in Section 5.1 of Chapter 5 as well as in Chang (2002b), a 
connection between TACSD and TSCSD is the relationship between P^ and R^^^. 

Basically, P^ requires the complete knowledge about the undesired target signature 
matrix U compared to R~^^^ which can be computed directly from sample data vectors 
without prior target knowledge. In order to obtain the prior information P ^ , R”'^^ makes 
use of spectral correlation among sample vectors to approximate P^ in the sense of 
minimum least squares error. As a result, how well R^‘^^ approximates P^ is determined 
by how accurate target information is represented by R^^^. This is also in turn 
determined by the intrinsic dimensionality (ID), which turns out to be the number of 
target signatures used in the target signature matrix M. According to this analysis, two 
major factors have substantial impact on detection performance of TACSD and TSCSD. 
One is the target knowledge, used in M and another is the number of 

target signatures, p which determines the number of eigenvectors, q to be used to 
computer So, in this chapter we investigate these two sensitivity issues on 

subpixel target detection. 

The first issue arises from the high spectral resolution of a hyperspectral sensor 
which can uncover many unknown target signatures in an image scene including 
background signatures, natural targets, interferers, clutter, etc. Under this circumstance, 
finding an appropriate M to well represent the image scene is crucial. But it is also very 
difficult because some of these target signatures cannot be identified a priori. In order to 
resolve this problem, three unsupervised algorithms developed in Chapter 5 can be used 
to find these target signatures in an unsupervised manner. As will be demonstrated in 
experiments, including these target signatures in M will significantly improve the 
performance of TACSD. 
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The second issue is to estimate an appropriate p, which determines the performance 
of TSCSD. But this is closely related to ID of the image data, viz. the number of image 
endmembers in the data. Unfortunately, ID is generally not known in practice. As a 

result, without knowing the exact value of q calculating may be very tricky. This is 

due to the fact that small eigenvalues dominate and differentiating small 

eigenvalues from noise variances is considered to be very challenging in signal 
processing. A similar problem also occurs in anomaly detection in Chapter 6 where the 
detection performance is also heavily affected by the number of eigenvectors used to 
calculate R',‘ ^ or K;^‘ 

There is another major distinction between TACSD and TSCSD. TACSD requires a 
linear mixture model to estimate the abundance fractions of target signatures present in an 
image scene. Therefore, it needs the complete knowledge of target information in the 
image data. To the contrary, TSCSD is developed without knowing image background. 
So, no linear mixture model is required for TSCSD. However, both types of subpixel 
target detection do share the same difficulty, i.e. solving an inverse problem. To invert a 
linear mixing problem, TACSD requires that the number of target signatures present in 
an image scene cannot be greater than ID to avoid over-representing the image data. 
Correspondingly, TSCSD requires that the number of pixel samples be greater than ID to 
avoid singularity in computation of R~*^^ or In both cases, the problem winds up 

the determination of ID. In Chapter 17, we will introduce a new concept, called virtual 
dimensionality (VD), which is a little bit different from ID in the sense that VD is 
defined by a number of spectrally distinct signal sources in image data. The VD may 
provide a good estimate of how many eigenvectors, q required to calculate In 

Chapter 17, this problem will be addressed and several methods to estimate VD will be 
proposed for this purpose. 



7.2 SENSITIVITY OF TARGET KNOWLEDGE 

The techniques presented in Chapters 3 and 4 required different degrees of target 
knowledge. In order to conduct a fair comparative analysis, the same target information 
must be made available to these techniques despite the fact that TSCSD requires only the 
knowledge of targets of interest. To meet this need, the UNCLS algorithm in Section 5.4 
will be used to generate such unsupervised target information for a thorough comparative 
study where the five different subpixel target detection methods, OSP, TACSD (SCLS 
and NCLS) and TSCSD (CEM and TCIMF) are selected for evaluation. The term of 
"'unsupervised target information or knowledge^' is referred to as information or 
knowledge that is obtained directly from the data in an unsupervised means. The problem 
of sensitivity to target knowledge will be addressed first, then noise sensitivity to 
computation of The following computer simulations are particularly designed to 

illustrate the sensitivity issue of target information used in TACSD and TSCSD. 

The laboratory AVIRIS data set in Fig. 1 .5 will be used to evaluate the performance 
of OSP, SCLS, NCLS, CEM and TCIMF. The data set contains five field reflectance 
spectra, dry grass, red soil, creosote leaves, blackbrush and sagebrush. In this case, the 
complete prior target knowledge is the target signature matrix M = [nij m 3 mj 
consisting of these five spectral signatures with their associated abundance fractions given 
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by OL = (a^,a^,a^,a^,a^Y . Unlike the experiments conducted in Chapter 3, which was 

based on the complete knowledge of these five target signatures, the following 
experiments assume that the target knowledge is either partial available or has no a priori 
information at all. 

Two examples are considered to illustrate how the target information affects the 
subpixel target detection performance. The first example assumes that there is partial but 
not complete target knowledge available prior to detection. In this case, the UNCLS 
algorithm will be used to generate the necessary target information that was required for 
OSP, SCLS and NCLS target detectors. The second example makes an assumption that 
no prior target knowledge is available and all the required target information must be 
obtained directly from the unknown image data. Once again, the UNCLS algorithm is 
also used to generate the necessary target information for all the five methods, OSP, 
SCLS, NCLS, CEM and TCIMF. 

Example 7.1 Partial Target Knowledge 

The data used in this example were the same used in Section 4.5.1 where only three 
target signatures, creosote leaves, dry grass and red soil were used for illustration. 
Assume that the only target knowledge available to us was the creosote leaves. We 

simulated 400 mixed pixel vectors, j. as follows. We started the first pixel vector 

with 100% red soil and 0% dry grass, then began to increase 0.25% dry grass and 
decrease 0.25% red soil every pixel vector until the 400th pixel vector which contained 
100% dry grass. We then added creosote leaves to pixel vector numbers 198-202 at 
abundance fractions 10% while reducing the abundance of red soil and dry grass 
accordingly. For example, after addition of creosote leaves, the resulting pixel vector 200 
contained 10% creosote leaves, 45% red soil and 45% dry grass. White Gaussian noise 
was also added to each pixel vector to achieve a 30:1 signal-to-noise ratio as was defined 
as 50% reflectance divided by the standard deviation of the noise in Harsanyi and Chang 
(1994). The spectra of these 400 simulated pixel vectors shown in Fig. 4.9 are also 
reproduced in Fig. 7.1 for convenience where Fig. 7.1(a) shows the abundance of each 
pixel at particular band 30 and Fig. 7.1(b) shows the abundance of each pixel averaged 
over the total 158 bands. 

Abundance at Band 30 Average Abundance of 1 58 Bands 





Figure 7.1. 400 simulated mixed pixel vectors used in Example 7.1 

Since the creosote leaves was the only target knowledge available to us, we used it 
as an initial target to specify used in the UNCLS algorithm. The estimated 

abundance fraction of for each of 400 simulated pixel vectors is denoted by «o^^(r.) 
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where r. indicates the /-th simulated pixel vector with i = 1,2, ■■•,400. Using 
we calculated LSE between all simulated pixel vectors for 1 < ^ < 400 and 

Since the resulting maximum LSE was not below a prescribed threshold, 
UNCLS was continued to search for a pixel vector that yielded the maximum LSE. In 
this example, the seventh pixel vector with 98.5% red soil, was selected as the first 
target, denoted by with its signature specified by nij . The UNCLS algorithm 

was again used to estimate the abundance fractions of and , denoted by (r . ) 

and al^\r.) for each of 400 simulated pixel vectors with i = 1,2, •••,400. Using the 

estimated abundance fractions we calculated LSE of all simulated pixels vectors for 

1 < /: < 400 between the least squares linear mixture, Because 

the resulting maximum LSE was still not below the prescribed threshold, the UNCLS 
algorithm was continued and the 400-th pixel with 100% dry grass, was selected as 
a second target t, = with its signature specified by m, . After finding , the 
resulting maximum LSE was below the prescribed threshold and the UNCLS algorithm 
was terminated. At this point, two targets = red soil and t, = - dry grass 

were identified, which were not known a priori. Using these three target signatures , 
irij and to form a desired target signature matrix M, Fig. 7.2(a-c) shows the 
detection results of OSP, SCLS, NCLS, CEM and TCIMF where creosote leaves, red 
soil and dry grass were used as the desired signatures and were also detected respectively. 
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(a) Detection of creosote leaves 
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(b) Detection of red soil 
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(c) Detection of dry grass 

Figure 7.2. Comparative results of OSP, SCLS, NCLS, CEM and TCIMF 
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As we can see from figures in Fig. 7.2(a-c), OSP and two PCLS detectors, SCLS 
and NCLS, performed similarly. NCLS was the best among these three detectors in the 
sense that the abundance fractions of the desired target signatures were detected very 
accurately. On the other hand, OSP was the worst because the detected abundance 
fractions of desired target signatures were not correct despite that it also detected the 
desired target signatures. OEM and TCIMF performed very differently from OSP, SCLS 
and NCLS. Both CEM and TCIMF correctly detected creosote leaves in Fig. 7.2(a), but 
they had completely different performance in detection red soil and dry grass. The 
amounts of red soil detected in Fig. 7.2(b) by CEM filter fluctuated and started from 1, 
then gradually decreased to -0.5. Interestingly, it also detected the creosote leaves at 
pixels 198-202 but with large negative amounts. A similar phenomenon in detection of 
dry grass was also observed in Fig.7.2(c) for CEM. The most intriguing case is the 
performance of TCIMF. Since TCIMF constrained the particular targets and while 

eliminating all other target pixels, it only detected a single red soil pixel in Fig. 
7.2(b) and a single dry grass pixel in Fig. 7.2(c). 

Example 7.2 No Target Knowledge Available A Priori 

The only difference between this example and Example 7.1 is that no initial target 
signature was given a priori. The target must be generated from the data set. In 
this case, we selected the pixel vector with maximum length, which turned out to be the 
400th pixel vector with 100% dry grass, Using to initialize the UNCLS 

algorithm and following the same procedure in Example 7.1 we found the fourth pixel 
vector, to be the first target = red soil with 99.25% and the 200th pixel 

vector with 10% creosote leaves to be the second target 

UNCLS algorithm was terminated. The signatures of three targets, and m, 

were used to form a desired target signature matrix M. It should be noted that targets 
generated by UNCLS, r^and in this example were different from and 
generated in Example 7.1. 




(a) Detection of the first found target, dry grass 




(b) Detection of the second found target, red soil 
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OSP 



SCLS NCLS CEM TCIMF 

(c) Detection of the third found target, creosote leaves 
Figure 7.3. Comparative results of OSP, SCLS, NCLS, CEM and TCIMF with no prior target knowledge 



Fig. 7.3(a-c) shows the detection results produced by OSP, SCLS, NCLS, CEM 
and TCIMF where only the OSP-result was shown by a different scale. Once again, 
NCLS performed the best among all the five detectors, but it did not perform as well as 
it did in Fig. 7.2 in terms of estimating abundance fractions due to the lack of prior 
information about target signatures. Similar observations were also applied to OSP and 
SCLS. Under this circumstance OSP, SCLS and NCLS behaved more like a target 
detector rather than a target abundance estimator as shown in Fig. 7.2. Interestingly, the 
detection result of red soil in Fig. 7.3(b) produced by CEM looked the exactly upside 
down of that in Fig. 7.2(b) produced by CEM with slight different magnitudes. In 
detection of creosote leaves, the results produced by CEM in Figs. 7.2(a) and 7.3(c) 
looked similar but the detected abundance fractions were different. In detection of dry 
grass, CEM produced nearly same results in both cases. Like Example 7.1, TCIMF only 
detected single target pixels on which it constrained. If we compare the detection result of 
the creosote leaves produced by TCIMF in Fig. 7.2(a) to that in Fig. 7.3(c), TCIMF 
detected creosote leaves at pixel numbers 198-202 in Fig. 6(a) with TCIMF only 
detecting the creosote leaves at in Fig. 7.3(c). This is because the former used the 
precise knowledge of the creosote leaves to produce the result in Fig. 7.2(a), while the 
latter used only as the desired target knowledge to produce the result in Fig. 7.3(c). 



7.3 SENSITIVITY OF NOISE 

In this section, we investigate the noise sensitivity issue for TSCSD. Since this 
issue is closely related to the number of eigenvectors, q that is used to calculate 
we will deal with the issue of the effect caused by q rather than noise directly in the 
following experiments. In order to complete characterize the impacts resulting from q, we 
also make an assumption that the complete target knowledge is known a priori. 

7.3.1 TSCSD 

According to (4.6) and (4.10), CEM and TCIMF share the same detector structure 
that calculates As expected, their performance will be very similar. Only in some 
cases, TCIMF may take advantage of partial target knowledge to outperform CEM. 

Example 7.3 Target Signatures with Relatively Large Abundance Fractions 

Unlike two examples considered in Section 7.2, we simulated 400 mixed pixel 

r 1 400 

vectors, using all the five field reflectance spectra, dry grass, red soil, creosote 
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leaves, blackbrush and sagebrush shown in Fig. 1.5. They were divided into four groups, 
each of which contains a hundred pixel vectors with the same mixture. The first group 
consisted of the first hundred pixel vectors with the mixture made up of 50% sagebrush 
and 50% dry grass. The second group consisted of the second hundred pixel vectors with 
the mixture made up of 50% sagebrush and 50% red soil. The third group consisted of 
the third hundred pixel vectors with the mixture made up of 50% sagebrush and 50% 
creosote leaves. The fourth group consisted of the fourth hundred pixel vectors with the 
mixture made up of 50% sagebrush and 50% blackbrush. More specifically, each of the 
400 simulated pixel vectors was a two-component mixture with 50-50 split and all of the 
400 pixel vectors share the same amount of sagebrush, that is, 50% sagebrush. White 
Gaussian noise was added to each pixel vector to achieve a 30:1 signal-to-noise ratio as 
defined in Example 7.1. Fig. 7.4 shows these 400 simulated pixels. 




(a) Abundance of each pixel at band 30 (b) Abundance of each pixel averaged over 158 bands 

Figure 7.4. 400 simulated mixed pixel vectors used in Example 7.3 



Fig. 7.5(a-e) shows the results of OSP, SCLS, NCLS, CEM and TCIMF in 
detection of dry grass, red soil, creosote leaves, blackbrush and sagebrush respectively. 




OSP SCLS NCLS CEM TCIMF 



(b) Detection of red soil 





Ipll 



























OSP SCLS 




NCLS 




CEM TCIMF 



(c) Detection of creosote leaves 
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(d) Detection blackbrush 




OSP SCLS 






NCLS 




CEM TCIMF 



(e) Detection of sagebrush 

Figure 7.5. Results of OSP, SCLS, NCLS, CEM and TCIMF in detection of dry grass, red soil, creosote leaves, 
blackbrush and sagebrush 



Due to nature of unconstrained abundance, the abundance fractions detected in Fig. 
7.5 had a wide range of values. In order to display the detection results for visual 
inspection, two different scales were used in Fig. 7.5, one for OSP, another for SCLS, 
NCLS, CEM, and TCIMF. As shown, OSP, SCLS and NCLS were able to detect all 
the five signatures, but OSP did not perform well in detection of sagebrush. However, if 
we examine the amounts of their detected target abundance fractions, the one detected by 
OSP did not reflect the true abundance, but those detected by SCLS and NCLS did. This 
is because OSP is unconstrained, and both SCLS and NCLS are at least partially 
constrained. Surprisingly, CEM, which was shown to be effective in Harsanyi (1993), 
Harsanyi et al. (1994), Farrand and Harsanyi (1997) and TCIMF performed poorly in this 
example. Similarly, TCIMF did not perform well either in detection of these five targets. 
Nevertheless, it did show high peaks when it detected desired targets, compared to CEM, 
which indicated no sign of detecting the five targets. This is because the q used to 
calculate in (4.6) is crucial. So, in order to see how the q affected the detection 
performance, experiments were conducted for CEM and TCIMF using q = 3, 5, 10, 60 to 
detect blackbrush and the results are shown in Fig. 7.6(a-b). As we can see from Fig. 
7.6, CEM detected the blackbrush when q = 5 and 10. TCIMF also detected blackbrush 
at ^ = 60 and performed better than the case that it used all full bands in Fig. 7.5(d). 
Interestingly, when q was too small such as 2, 3, both CEM and TCIMF detected wrong 
targets whose mixtures were creosote leaves and sagebrush resulted in larger values than 
those whose mixtures were blackbrush and sagebrush. This experiment demonstrated the 
significance of the q in computation of 




q^l <7 = 3 
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g-2 q = 3 ^=^5 q=\0 q~60 

(b) TCIMF 

Figure 7.6. Results of CEM and TCIMF using q = 2,3, 5, 10, 60 with blackbrush as the desired signature 



Example 7.4 Target Signatures with Small Amount of Abundance Fractions 

The experiments conducted in this example provide a different result for CEM and 
TCIMF. The used data were the same set of the 400 simulated pixels shown in Fig. 7.1 
in Example 7.1, but the complete knowledge of creosote leaves, dry grass and red soil 
was assumed to be known. Fig. 7.7 shows the results of OSP, SCLS, NCLS, CEM and 
TCIMF in detection of creosote leaves (Note that only OSP-result is shown by a different 
scale) which were very close to those in Fig. 7.2(a). Unlike Example 7.3, this time all 
Eve methods, OSP, SCLS, NCLS, CEM and TCIMF produced comparable detection 
results. However, OSP was the worst in terms of detecting the true abundance Eactions 
of the creosote leaves, whereas SCLS and NCLS were among the best. 




OSP SCLS NCLS CEM TCIMF 

Figure 7.7. Results of OSP, SCLS, NCLS, CEM and TCIMF in detection of creosote leaves 



In order to see the effects resulting from different values of q used to calculate 
in CEM and TCIMF, Fig. 7.8(a-b) shows the results of CEM and TCIMF using q = 2, 
3, 5, 10, 60. Compared to the results produced using q = 158 in Fig. 7.7, the abundance 
of the creosote leaves detected in Fig. 7.8 for CEM and TCIMF was more accurate. 
Interestingly, when <7 = 2, TCIMF missed the target, while CEM extracted creosote 
leaves but also extracted dry grass across the entire range. For CEM, q = 60 yielded the 
best result where the estimated abundance of creosote leaves was nearly accurate. But 
even in this case, the result was still not as good as that produced by SCLS and NCLS 
in Fig. 7.7 in the sense of its performance in detecting abundance fractions of other 
pixels. 




g-2 q~3 q = 5 7=10 7 = 60 

(a) CEM 
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q = 2 q = 1 q = 5 ^-=10 q = 60 

(b) TCIMF 

Figure 7.8. Results of (a) CEM and (b) TCIMF using the number of eigenvectors ^ = 2, 3, 5, 10, 60 in the 
detection of creosote leaves. 

For TCIMF, it performed very well except q = 2. In particular, it detected nearly 
correct abundance fractions of the creosote leaves at ^ = 60. 

While Example 7.3 shows one extreme case for CEM and TCIMF, Example 7.4 
offers another extreme case for CEM and TCIMF. Both examples further demonstrate the 
crucial role the number of eigenvectors, q plays in the performance of CEM and TCIMF. 
If each eigenvector is interpreted as a piece of information, the larger the eigenvalue is, 
the more significant information it represents. So, these two examples suggest that when 
the desired target is small or occurs with low probability, the number of eigenvectors to 
be used for q is generally very high because targets with small abundance fractions may 
correspond to small eigenvalues and can be then viewed as insignificant targets. Under 
this circumstance, they may not be able to be detected by using only a few eigenvectors. 
Therefore, it requires a large set of eigenvectors to find these targets. This explains why 
CEM and TCIMF can be used to detect small targets so effectively using full bands. 
Conversely, if the desired targets are relatively large and wide-spread like the one studied 
in Example 7.3, a smaller q may be more appropriate to make CEM and TCIMF 
effective because the information provided by these targets can be well-represented by a 
few largest eigenvectors. In this case, a small set of large eigenvectors may be sufficient 
to detect these targets. 

Example 7.5 Target Signatures Used as Interferers 

The same simulated data used in Example 7.4 are also used in this example except 
that two more additional signatures, blackbrush and sagebrush were assumed to be 
present in the data even though they were actually not present. In this case, the signature 
matrix M = [m^ m, contained the five signatures, dry grass, red soil, 

creosote leaves, blackbrush and sagebrush. With this scenario, the blackbrush and 
sagebrush acted as interferers rather than target signatures. Fig. 7.9 shows the results of 
OSP, SCLS, NCLS, CEM and TCIMF in detection of creosote leaves. 




OSP SCLS NCLS CEM TCIMF 

Figure 7.9. Results of OSP, CEM, NCLS, SCLS and TCIMF in detection of creosote leaves with blackbrush 
and sagebrush acted as interferers. 
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Unlike Fig. 7.7, the performances of the five detectors were quite different. The 
worst performance was produced by OSP. This is because the spectra of the two new 
added interferers, blackbrush and sagebrush signatures were very similar to that of 
creosote leaves and the detection capability of OSP was considerably reduced by the 
undesired signature annihilator . Similarly, SCLS and NCLS also suffered from the 
same problem. Nevertheless, NCLS still managed to perform reasonably well in spite of 
a slight degradation in detection of creosote leaves at pixel 200. Further, it actually did 
better than it did in Fig. 7.7 at other pixels by nulling out the abundance of creosote 
leaves. For CEM, the result was identical to that in Fig. 7.7 because the addition of 
blackbrush and sagebrush with zero abundance did not affect the output energy of CEM. 
But it did make difference for TCIMF since it eliminated the creosote leaves whose 
signature was similar to the two interferers. It performance was very similar to that 
produced by SCLS. However, if we conducted experiments using different values for q in 
TCIMF, the results shown in Fig. 7.10 were surprising. TCIMF actually detected the 
creosote leaves effectively for g == 2, 3, 5, 10 even for the case q == 2 where it detected the 
target with negative amounts. Fig. 7.10 further shows an advantage of TCIMF over 
CEM and a major difference between TACSD and TSCSD. 
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<7 = 2 q = 3 q = 5 ^=10 q = 60 



Figure 7.10. Results of TCIMF using q = 2, 3, 5, 10, 60 in the detection of creosote leaves with blackbrush 
and sagebrush acted as interferers. 



7.3.2 Hyperspectral Image Experiments 

The computer simulations conducted in the previous section concluded that the 
sensitivity of target knowledge had significant impact on detection performance. In this 
section, real hyperspectral image experiments will provide further evidence on this issue. 

7.3.2. 1 AVIRIS Data 

The hyperspectral data used in the following examples are AVIRIS data shown in 
Figure 1.6(a). 

Example 7.6 (complete target knowledge) 

In this example, the target knowledge is assumed to be known a priori where the 
signatures of the five targets, cinders, playa, rhyolite, shade and vegetation were directly 
extracted from the image scene as the same way conducted in Harsanyi and Chang 
(1994). Figure 7.11 shows the results of OSP, SCLS, NCLS, CEM and TCIMF using 
the entire eigenvectors, q - 158 where the U used in TCIMF was made of all other four 
target signatures except the desired target signature. 
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From the images in Fig. 7.11, we see that NCLS performed best in detection of all 
the five targets. In order to see how the number of eigenvectors, q affected the 
performance of CEM andTCIMF, 5 different numbers, q = 5,10,20,40,80 were used in 
CEM and TCIMF implementation and the results are shown in Figs. 7.12 and 7.13. 




q i* s 40 q ■" HO 






As shown in Figs. 7.12-7.13, TCIMF performed better than CEM in detection of all 
the five targets with q = 80. On the other hand, CEM seemed to do better than TCIMF 
when q was small. But in all both cases, they did not perform well and their performance 
was worse than that of CEM and TCIMF using ^ = 158 as shown in Fig. 7.11. 

Example 7.7 (one target signature known a priori) 

In Example 1.6, the complete knowledge of all the five target signatures was known 
a priori. In this example, we assumed that only partial target knowledge is available, 
which is the signature of the playa. The UNCLS algorithm was then used to generate five 
more unknown targets which has been identified as ti = cinders, ii = anomaly, t 3 = 
vegetation, U shade and ts - rhyolite with to = playa. Fig. 7.14 shows the comparative 
results of OSP, SCLS, NCLS, CEM and TCIMF where the U used in TCIMF was made 
up of all the remaining five target signatures by excluding the desired target signature. 
Interestingly, the third detected target was a playa pixel located at the upper left comer 

edge of the dry lakebed. This pixel vector was an anomaly, which was missed in 
Example 7.6 because it was very difficult to be identified by visual inspection from the 
image scene. 
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Comparing Fig. 7.14 to Fig. 7.11, all the five methods performed slightly worse 
than did their counterparts using full target knowledge. However, this experiment 
demonstrated a potential advantage of the UNCLS algorithm that can be used to detect 
anomalies. 

Like Example 7.6, we also examine the effects of ^ = 5,10,20,40,80 for CEM and 
TCIMF. Figs. 7.15-16 show the results of CEM and TCIMF respectively. 




Figure 7.15. CEM using to = playa as a known target signature for q = 5, 10, 20, 40, 80 
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Example 7.8 (no a priori target knowledge) 

Finally, we conclude this example by assuming that no prior target knowledge is 
given. In this particular situation, the UNCLS algorithm was used to generate 6 targets 
directly from the scene. They were identified as tg=playa, tj = cinders, 
t, = anomaly , = vegetation , = shade and = rhyolite where except for the 

initial target , the other five target pixels were the same pixels identified in Example 
7.7. Fig. 7.16 show the results of OSP, SCLS, NCLS, CEM and TCIMF using these 6 
targets as target information to detect cinders, anomaly, vegetation, shade and rhyolite in 
the image scene. 









■■I 






OSP' sn.s NCLfv CEM rauf 

%< ■■ rhys.^PJc 

Figure 7.17. Comparative results ofOSP, SCLS, NCLS, CEM and TCIMF with no a priori target knowledge 
using gr = 158 



Since no prior knowledge was assumed in this example, the initial target pixel 
must be generated from the image scene. As can be expected, the would not be the 
same initial target used in Example 7.7 that was assumed to be known a priori and 
obtained by averaging a large area in the dry lake. It was a single playa pixel extracted 
from the dry lake. As a result, CEM only performed differently from that in Example 7.7 
in detection of playa. 

If we compare Fig. 7.17 to Fig. 7.11, OSP, SCLS and NCLS with no a priori 
target information performed slightly worse than their counterparts with complete target 
knowledge. But, their results were still comparable. However, CEM and TCIMF did not 
perform well in detection of cinders, playa, shade and rhyolite, but were effective in 
detection of vegetation and anomaly. This was because CEM and TCIMF were very 
sensitive to the target information to be used in these detectors. In Fig. 7.17, only single 
pixel for each target was used for information. So, CEM and TCIMF worked well for 
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small targets such as vegetation and anomaly, but not for large areas such as of cinders, 
playa, shade and rhyolite which cover a wide range of spectral variation. 

To evaluate effects of q on the performance of CEM and TCIMF, the experiments 
were also conducted for q = 5, 10, 20, 40, 80. Fig. 7.18 shows only results of playa 
since the information used for other five targets in CEM was the same used in Example 
7.7. In comparison between Fig. 7.18 and Fig. 7.12(playa), the CEM using a single 
playa pixel performed much worse than the CEM using the information obtained by 
averaging a large number of pixels located in the dry lake. 




Unlike CEM, the TCIMF in this example performed very differently from the 
TCIMF in Example 7.7 because the undesired target signature matrix U was made up of 
a single playa pixel not the averaged playa signature used in Example 7.7. Fig. 7.19 
shows the results of TCIMF in detection of the 6 target signatures, playa, cinders, 
anomaly, vegetation, shade and rhyolite. 
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From Fig. 7.19, the performance of TCIMF was very similar to that in Example 7.7 
except the case of playa where TCIMF performed much worse that that in Example 7.7 
due to insufficient target information represented by a single playa pixel. 

73.2.2 HYDICE Data 

In Chapters 3 and 4, the HYDICE experiments were eonducted based on the 
complete knowledge of the 15 panels in the image scene in Fig. 1.7(a). This section 
investigates the issue of sensitivity to target knowledge with only partial or no prior 
knowledge available as well as the issue of noise sensitivity. 

Example 7.9 

In this example, we studied the case when the only available target knowledge was 
tg, which was chosen to be the first center panel pixel in row 1, p^^. Then the UNCLS 
algorithm in Section 5.4 was used to generate 34 unknown target pixels in the image 
scene among which 6 target pixels were identified by ground truth as =p„, 

= interferer, = p^^, = P 3 ,, = p^^, = p^^. Fig. 7.20(a-f) shows the detection 

results of these 6 targets using OSP, SCLS, NCLS, CEM and TCIMF. As we can see 
from the images, only CEM and TCIMF detected tiny amounts of p^ 3 , while the other 
three missed it because of its small amount of abundance. In addition, NCLS, CEM and 
TCIMF were among the best and performed very similarly except that NCLS has some 
difficulty with classifying panels in rows 2 and 3, panels in rows 4 and 5 in Fig. 
7.20(c,d,f). The OSP and the SCLS were the worst, but they still managed to classify 
panels in five rows except p ^3 and the interferer. 






Figure 7.20. Results of OSP, SCLS, NCLS, CEM and TCIMF using only the target information to = p,, 



In order to see how q affects the performance of CEM and TCIMF, Figs. 7.21 and 
7.22 show the detection results of the same 6 targets obtained in Fig. 7.20 for CEM and 
TCIMF using q = 5, 10, 20, 40 and 80 respectively. 
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As demonstrated, the larger the q was, the better the detection was. Both CEM and 
TCIMF performed very closely except for the case of ^ = 80 in detection of panels in 
rows 3-5 shown in images labeled by (d-f) where TCIMF eliminated more interference 
than did CEM. 

Example 7.10 

Example 7.9 assumed that the initial target was selected by partial knowledge. If 
we further assumed that no such partial knowledge was available, the UNCLS algorithm 
must be used to generate all necessary targets directly from the image scene including the 
initial target, . Once again, 34 unknown target pixels including were generated. 
Among them six target pixels were identified by ground truth, which were = 
interferer, t, = = Pj,, = p„, = p,, and = p^,. Since these 6 target 

pixels were the same 6 target pixels generated in Example 7.8, the classification results 
of all the five methods were identical to those in Figs. 7.19-7.21. This example 
demonstrated that completely unsupervised classification may perform as well as 
supervised classification as long as the desired target information obtained from the 
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image scene is sufficiently accurate. For more details of comparative analysis between 
NCLS and CEM, we refer to Chang and Heinz (2000a). 



7.4 SENSITIVITY OF ANOMALY DETECTION 

As demonstrated in Fig. 6.1(c), RXD was very effective to detect a single two-pixel 
anomalous target because the ratio of the target size to the entire image is 2:40000 and 
very small. However, if the anomaly is not small enough compared to its surroundings, 
RXD may not work well. In order to see that, we consider the following example. 

Example 7.11 

When RXD using and was applied to a smaller portion of the LCVF 

image scene with size from 60x60 down to 20x20. In this case, the ratio of the 
anomaly to the image is changed from 2:3600 to 2:400. Fig. 7.22(a-b) provides the 
detection results of RXD using and where the detection performance of the 

anomaly was gradually reduced. 




The detection performance of RXD was even worse when it was applied to the same 
400-simulated data points considered in Example 7.1 with the creosote leaves as 5-pixel 
anomaly where the ratio of the anomaly to the data set is 5:400. As shown in Fig. 7.24, 
RXD completely failed to detect the creosote leaves. 
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(a) 




The above experiments simply suggest that in order for RXD to be effective, the 
anomalous targets must sufficiently small. To illustrate this, we further expand the 400 

r T 4000 

simulated in Example 7.1 to 4000 mixed pixel vectors, |r, /. ^ , as follows. We start the 

first pixel vector with 100% red soil and 0% dry grass, then began to increase 0.025% 
dry grass and decrease 0.025% red soil every pixel vector until the 400th pixel vector 
which contained 100% dry grass. We then added three anomalous targets to these 40000 
simulated pixel vectors. The first anomalous target, creosote leaves was added to pixel 

vector numbers 1998-2002, at abundance fractions 10% while reducing the 

abundance of red soil and dry grass evenly. A second anomalous target, blackbrush was 
added to pixel vectors 998-1002, |r, | at abundance fractions 5% while reducing the 

abundance fractions of red soil and dry grass evenly to make the total concentration of 
each pixel vector 100%. And a third anomalous target, sagebrush was added to pixel 

r 1 3002 

vectors 2998-3002, |r. j at abundance fractions 5% while reducing the abundance 

fractions of red soil and dry grass evenly to make the total concentration of each pixel 
vector 100%. White Gaussian noise is also added to each pixel vector to achieve a 30:1 
signal-to-noise ratio as was defined in (Harsanyi and Chang, 1994). The resulting band- 
30 spectra of the simulated 4000 pixel vectors are plotted in Fig. 7.25(a) which contained 
three types of anomalous targets, creosote leaves, blackbrush and sagebrush at pixel 

i - 998 ’ J -1998 r u 2998 ^cspectively . The result of detecting these three 

anomalous targets using RXD with and are shown in Figs. 7.25(b) and 

7.25(c) respectively. 




As we can see from Figs. 7.25(b) and 7.25(c), both versions of RXD detected the 
creosote leaves, but RXD using was more effectively than RXD using In 

detection of blackbrush and sagebrush, RXD using extracted small fractions of 
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their abundance, but it did better than RXD using which missed both blackbrush 

and sagebrush. This may be due to the fact that the simulated data are stationary and the 
sample mean had more impact than the variance did in anomaly detection. Interestingly, 
when LPTD using and were applied to the same simulated data in Fig. 

7.25(a), the detection results of LPTD using K”'^^ shown in Fig. 7.26(a) were much 
better than that in Fig. 7.25(b) produced by RXD using despite the fact that the 
abundance fractions of the three anomalies detected in Fig. 7.26(a) were negative. By 
contrast, LPTD using R~^^^ performed very poorly by detecting very small amounts of 
creosote leaves and sagebrush. It should be noted that UTD is the LPTD using in 

which the correlation matrix R^^^ in LPTD is replaced with the covariance matrix 




(a) (b) 

Figure 7.26. (a) LPTD using ; (b) LPTD using 



As RXD-UTD and RXD-LPTD were further applied to the same data in Fig. 7.25(a), 
their detection results shown in Fig. 7.27(a-b) were very interesting, where the results 
produced by RXD-UTD in Fig. 7.27(a) were almost upside down of Fig. 7.26(a) with 
different abundance fractions. Once again, LPTD using R^^^^ did not perform well in 
detecting blackbrush and sagebrush as shown Fig. 7.26(c). 




(a) (b) 

Figure 7.27. (a) RXD-UTD; (b) RXD-LPTD 



If we further compare the detection results in Fig. 7.27(a) to that in Fig. 7.25(b), 
RXD-UTD performed better than RXD in terms of the magnitude of detected anomalies. 

Example 7.12 

The reason that RXD did not detect the blackbrush and sagebrush as well as the 
creosote leaves was due to the fact that the fractions of their abundance contributed to the 
simulated data was only 5% which is relatively small. Under such circumstance, the 
noise begins to show its dominance in detection performance. In this example, we 
investigate this issue using the same data set in Example 7.11. Following a similar 
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approach conducted in Section 7.3, the number of eigenvectors, q will be used to analyze 
the sensitivity ofRXD to noise. Fig. 7.28 shows the results of RXD, NRXD, MRXD, 
UTD and r5cD-UTD at q = 158. RXD detected the creosote leaves, but missed the 
blackbrush while barely detecting sagebrush. By contrast, UTD and RXD-UTD 
performed reasonably well in detecting the three anomalous targets at ^ 158. On the 

other hand, the detection performance of NRXD and MRXD perform very poorly at ^ = 




RXD NRXD MRXD UTD RXD-UTD 

Figure 7.28. q = L = 158 



However, the situation changed when q was small. Since there were only five 
material substances in the data set, the q cannot be too large. So, in this case, q was 
selected from 2 to 6 and 10 for experiments. Figs. 7.29-7.34 show the results ofRXD, 
NRXD, MRXD, UTD and RXD-UTD respectively. 




RXD NRXD MRXD UTD RXD-UTD 

Figure 7.29. ^ = 10 
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RXD 



NRXD 



MRXD 
Figure 7.33. 




UTD 




RXD-UTD 




RXD NRXD MRXD UTD RXD-UTD 

Figure 7.34. ^ = 2 



From these figures, RXD, UTD and RXD-UTD performed better than they did at q 
= 158, Their performance was very stable at ^ = 2, 3, 4, 5 6, and 10. The only visible 
improvement resulting from using smaller q was more removal of interference resulting 
from other pixels. Interestingly, NRXD and MRXD performed much better than they did 
at ^ = 158 and their performance was improved as q was decreased from 10 to 2. 

The above computer simulations demonstrated an important fact that the q played a 
crucial role in anomaly detection. In order for an anomaly detector to be effective, an 
appropriate q must be small. As mentioned previously, determination of the q is 
considered to be very difficult. Similar experiments were also conducted for RXD, 
NRXD, MRXD, LPTD and RXD-LPTD using The results are not included here 

because they were comparable to those obtained in Figs. 7.28-7.34 using with 

slight degradation in performance. 

Example 7.13 

In this example, the two hyperspectral images in Figs. 1.6(a) and 1.7(a) were used to 
investigate the effects of using different values of q on the detection perfonnance of the 
RXD, NRXD, MRXD, UTD and RXD-UTD. In order to determine an appropriate q, 
some partial knowledge may be helpful. Since there are five known target signatures plus 
an anomaly in the AVIRIS image in Fig. 1.6(a), an adequate q may be in the range of 7. 
By using the results in Table 17.1 in Chapter 17 for VD, q was selected to be 4, 5, 8 for 
AVIRIS image experiments. Figs. 7.35-7.37 show the results of RXD, NRXD, MRXD, 
UTD and RXD-UTD obtained for ^ = 4, 5, 8 respectively. As we can see from these 
figures, they all performed very similarly for small q"s. Compared to Figs. 6.2(a-b) and 
6.3(a), NRXD, MRXD and UTD performed quite differently. In particular, UTD 
extracted the cinders along with the image background. 






Similar experiments were also conducted for the 15-panel HYDICE image where q 
was chosen to be 18, 20 and 22 according to Table 17.2 in Chapter 17. Figs. 7.38-7.40 
show the results of RXD, NRXD, MRXD, UTD and RXD-UTD for g = 18, 20 and 22 
respectively. 




As shown in these images, the results were very similar. Like AVIRIS image 
experiments, NRXD, MRXD and UTD performed differently from what they did aX q = 
169. It is also interesting to note that UTD could detect the panels in row 1 compared to 
Fig. 5.5(d) where it extracted image background. This example demonstrated that a small 
q could make a difference on the performance of NRXD, MJ^D and LPTD, specifically, 
UTD. Same experiments were also conducted for RXD, NRXD, MRXD, LPTD and 
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RXD-LPTD using The results showed that there was little difference for RXD- 

LPTD using and using when q was small. 

Example 7.14 

This example is particularly designed to study subtle difference in noise sensitivity 
resulting from RXD using and The 15-panel HYDICE scene can be used for 
this purpose because the ground truth map in Fig. 1.7(b) provides the precise locations of 
all panel pixels. The image in Fig. 7.41(a) shows the anomalous targets detected by RXD 

using when full bands were used, i.e. 169. Next to the image is a plot that 

shows the abundance of targets detected in the image. The peak labeled by in the plot 
is the maximum amount of abundance detected in the image. Similarly, Fig. 7.41(b) 

shows the results of RXD using 




As we mn see from Fig. 7.41 (a-b), there was no appreciable difference between RXD 
using K.’^ and The highest peak, appearing in both cases cccurred at the 

interferer located at the top upper left comer in the irriage. 

H(>wever, w'herHhe p was deta'cased from 169 to SO, the targets with delectetJ by 
RXD using and were different. 




5 <.r 4 i ^ wl i' W > > ' . ' " k '^5 5 % s\ 
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As shown in the plot of Fig. 7.42(a), the same interferer in Fig. 7.41(a) produced the 
highest peak. However, this is not true when R~'^^ was used in RXD. The pixel that 
produced the highest peak in the plot of Fig. 7.42(b) was the panel pixel p,j, not the 
interferer in Fig. 7.42(a). But, when the q was further decreased to 60, both produced the 
same target pixel p^jWith as shown in Fig. 7.43(a-b). 
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Irileiej>tingly\ us the q wa?? i^sducect lo 40, the linage resulting from RXD nsing 
•slill kept the same target pixel with in Fig, ?,44(a), but the target pixel with 
delected by RXD using W'.\^ m Fig. 7.44(b) wm shifted to the panel pixel p.,. 
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Now vshen the q was furl.he.r reduced to 20, the panel pixel p.., remamed the highest 
peak in the image generated by RXD using showm in. Fig. 7.45(aK Bnt the target 
pixel with detected by .R.X.D using K '^[, was shifted from p.^, to the panel pixel 
shown in Fig, 7,4S(b). 




Finally, when the q was down to 10 and 5, the images in Figs. 7.46-7.47 produced 
by RXD using and were nearly the same with the same target pixel p^j 

having the highest peak. 
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Compared to Figs. 7.41-7.45, a noticeable difference was that the image background 
in Figs. 7.45-7.46 was almost nulled out. In particular, the strong interferer and the panel 
pixel pj, detected in an early stage at a larger q completely vanished. This implied that 
using a smaller q was more effective in background removal than using a larger q. 

As a concluding remark, the study of noise sensitivity is a challenging issue that has 
been widely overlooked in the past. Using full bands is not always a good solution. 
Instead, finding an appropriate q is crucial to anomaly detection. 



7.5 CONCLUSIONS 

This chapter investigates two important issues arising in subpixel target detection, 
sensitivity of target knowledge and sensitivity of noise. Both play a significant role in 
the performance of TACSD, TSCSD and anomaly detection. In order to illustrate how 
these two issues affect the detection performance, a comprehensive experiment-based 
comparative analysis is studied for OSP, SCLS, NCLS, CEM and TCIMF. Since 
anomaly detection does not require a priori target knowledge, only the issue of noise 
sensitivity was investigated. A series of computer simulations is also conducted to 
analyze detection performance of the five anomaly detectors presented in Chapter 6. The 
experiments demonstrated that the noise has significant impact on anomaly detection due 
to small energy of anomalies. In order to reduce the noise sensitivity, the number of 
eigenvectors, q that is used to calculate the inverse of the sample covariance or correlation 
matrix must be small. How to select an appropriate q is generally a challenging problem. 
This issue will be addressed in Chapter 17. 
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UNCONSTRAINED MIXED PIXEL CLASSIFICATION 



Classification in standard image processing is usually performed spatially on a pure 
pixel basis. The objects of interest are classified based on their spatial properties. In 
remote sensing image processing, image data are acquired by a number of spectral 
channels at different wavelengths; thus they are actually image cubes with the third 
dimension specified by spectral wavelengths. As a result, each image pixel is a column 
vector, of which each component is represented by a particular spectral band. The spectral 
information contained in such an image pixel vector cannot be generally explored by 
spatial-based classification methods. Mixed pixel classification (MFC) provides a 
solution. A general approach to MFC is linear unmixing which models the spectral 
signature of each pixel vector as a linear mixture of spectral signatures of targets present 
in the image. The classification is then performed based on the abundance fraction 
estimated for the spectral signature of each target resident in the pixel vector. As a result, 
MFC-generated images are generally gray scale and represent the abundance fractions of 
target signatures estimated from the image data. So, MFC is usually carried out by 
visual inspection. This approach is very different from the (0,l)-class membership 
assignment carried out by pure pixel classification (FFC) where a "1" implies that the 
object belongs to a specific class and a "0" indicates that the object is not in that 
particular class. There is no soft decision in FFC as is done in MFC with estimated 
abundance fractions. In Fart III, least-squares based approaches to MFC are investigated, 
particularly, those without constraints imposed on abundance fractions. Chapter 8 
describes a class of well-established OSF-based MFC methods that have shown success 
in hyperspectral image classification. It includes the OSF classifier first proposed by 
Harsanyi and Chang (1994), and several a posteriori OSF classifiers derived from 
Harsanyi-Chang’s OSF classifier. Chapter 9 uses a standardized hyperspectral data to 
conduct a comparative analysis and quantitative study among several commonly used 
methods, specifically, the OSF-based MFC and Fisher's linear discriminant analysis- 
based FFC. A set of custom-designed criteria is also introduced for performance 
evaluation. In order to make comparison between MFC and FFC, the MFC-generated 
gray scale images must be converted to class-labeling images. Two thresholding criteria, 
winner-take-all and abundance percentage cutoff, are also developed for this purpose in 
Chapter 9 to convert a mixed pixel to a pure pixel. As a consequence of thresholding, the 
gray scale information produced by MFC for abundance fractions will be lost, which will 
result in performance degradation. A comprehensive experiment-based study on such 
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mixed-to-pure pixel conversion is also conducted in Chapter 9 to demonstrate the 
significant importance of information provided by a mixed pixel through MPC. 
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UNCONSTRAINED MIXED PIXEL CLASSIFICATION: 
LEAST-SQUARES SUBSPACE PROJECTION 



The orthogonal subspace projection (OSP) for hyperspectral image classification was 
first reported in Harsanyi and Chang (1994) and has been successfully applied to 
hyperspectral data exploitation since then. Its ability in subpixel detection was also 
demonstrated in Chapter 3. As we recall (3.9), the OSP-derived detector was given by 

<5Q5p(r) = P^{t) with the scale constant K = 1. In Chapter 6, we have seen that this 

scale constant k was actually determined by the a posteriori information that was used to 
estimate the unknown abundance fractions. Since OSP assumed the complete knowledge 
of the target signature matrix M and did not estimate the abundance vector a, the scale 
constant K was absent in As long as the abundance fractions detected for a 

provide sufficient amounts for target detection, it did not matter if a was estimated 
accurately. That was why OSP worked effectively for the real hyperspectral data 
experiments in Harsanyi and Chang (1994). However, this may not be true in terms of 
abundance estimation. So, in this chapter, the OSP in Chapter 3 is revisited for mixed 
pixel classification. It is then extended by three unconstrained least-squares subspace 
projection approaches, called signature subspace projection (SSP), target subspace 
projection (TSP) and oblique subspace projection (OBSP) where the abundance fractions 
of target signatures are not known a priori, but are required to be estimated from the 
data. The three subspace projection methods use their estimated signature abundance 
fractions to achieve target classification in a mixed pixel. As a result, they can be viewed 
as a posteriori OSP as opposed to the OSP in Chapter 3, which can be thought of as a 
priori OSP. In order to evaluate these three approaches, a least-squares estimation error is 
cast as a signal detection problem in the framework of the Neyman-Pearson detection 
theory so that the detection performance can be measured by the receiver operating 
characteristics (ROC) analysis. 



8.1 INTRODUCTION 

Image classification is a segmentation method, which aggregates image pixels into a 
finite number of classes by certain rules so that each class represents a distinct entity with 
specific properties. In multispectral/hyperspectral imagery, the spectral signature of a 
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scene pixel is generally mixed by a number of target (or endmembers) spectral signatures. 
Consequently, such a scene pixel is considered as a mixed pixel. Two models have been 
proposed in the past to describe such mixing activities. One is the marcospectral mixture 
(Singer and McCord, 1979), which models a mixed pixel as a linear combination of 
target signatures resident in the pixel with relative concentrations. A second model 
suggested by Hapke (1981), called the intimate spectral mixture, is a nonlinear mixing of 
target signatures present within the pixel. Nevertheless, Hapke's model can be linearized 
by a method proposed by Johnson et al. (1983). In this chapter, only the linear spectral 
mixture model will be considered. By taking advantage of a linear mixture model, many 
image processing techniques are readily applied. Of particular interest is the principal 
components analysis (PCA) (Richards and Jia, 1999; Schowengerdt, 1997), also known 
as Karhunen-Loeve transformation, which is widely used to achieve data dimensionality 
reduction, feature extraction, etc. As a result of PCA, the data coordinates will be rotated 
in such a manner that the significant information of the data can be prioritized in 
accordance with the magnitude of the eigenvalues of the data covariance matrix. However, 
two disadvantages arise from the PCA approach. One is that the pixels in the PCA- 
transformed data are still a mixture of target signatures with unknown abundance 
fractions. So, the determination and identification of individual target spectral signatures 
remains unsolved. Malinowski (1977) and Heute (1986) proposed a solution. They first 
reconstructed the original data using the largest PCA-generated eigenvalue and measured 
the error between the raw data and the reconstructed data to see if the error falls within the 
prescribed tolerance. If not, they gradually added to data reconstruction by including 
additional eigenvalues in decreasing magnitude until the resulting error within the desired 
error level. A second disadvantage resulting from PCA is that PCA is only optimal in 
the sense of minimum mean squared error, but not necessarily optimal in terms of class 
discrimination and separability (Jenson and Waltz, 1979; Juang and Katagiri, 1992; Lee 
and Landgrebe, 1993) and signal-to-noise ratio (Green et al. 1981; Lee et al., 1990). 

The OSP in Chapter 3 developed for subpixel detection can be also used for mixed 
pixel classification. It formulated an image classification problem as a generalized 
eigenvalue problem. The resulting classifier is an operator composed of two linear filters, 
a simultaneous diagonalization filter developed in Miller (1992) followed by a second 
filter, called matched filter derived from communication systems. However, as mentioned 
previously the model on which OSP was based assumed the complete target knowledge 
without estimating the abundance vector a. As a result, it can detect and classify correct 
target signatures, but cannot correctly estimate amounts of abundance fractions of target 
signatures. Despite the success of OSP in classification of AVIRIS data (Harsanyi and 
Chang, 1994; Farrand and Harsanyi, 1997), there is a lack of theory to explain how OSP 
works. In this chapter, we will revisit the OSP approach and offer a theoretical 
background for OSP from an estimation theory's point of view. The theory is derived 
based on unconstrained least-squares estimation and can be used to estimate target 
abundance fractions rather than detect target abundance fractions as does OSP. 
Accordingly, the approaches presented in this chapter can be referred to as a posteriori 
OSP, while the OSP in Chapter 3 (as also in Harsanyi and Chang, 1994) can be viewed 
as a priori OSP. A recent work in Settle (1996) also derived an unconstrained Gaussian 
maximum likelihood (GML) classifier which generated the same detector, 

^Qgp(r) = d^/^^(r) specified by (3.9) that were derived by Harsanyi and Chang (1994) 

with an additional constant . Two important differences between the 

unconstrained GML classifier and OSP are significant. One is this constant K that has 
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been overlooked in the past. It has been considered as a normalization constant, which is 
immaterial to data processing. As noted in Chapters 5 and 6 this constant is indeed an 
important factor. It is determined by a posteriori information that can be used to 
estimate the abundance fractions. It is also very closely related to the accuracy of 
abundance estimation as demonstrated in Settle (1996) and Chang (1998). Another is the 
design rationale. The unconstrained GML classifier is an a priori approach, which 
maximizes the Gaussian conditional probability distribution of an unlmown constant 
specified by target signature abundance fractions. Compared to the unconstrained GML 
classifier, OSP maximizes the signal-to-noise ratio based on Fisher's discriminant 
criterion, which only depends upon the noise second-order statistics. It can be derived as 
an a posteriori approach as will be shown in this chapter. When the noise is assumed to 
be Gaussian, the unconstrained GML classifier turns out to be a posteriori OSP. 

This chapter presents three approaches that extend OSP in Chapter 3 to the case that 
the signature abundance fractions can be estimated directly from the image based on least- 
squares subspace projection. They all are unconstrained least-squares estimation methods. 
The first approach was proposed in Tu et al. (1998), which projected image pixel vectors 
into a signature space linearly spanned by target signatures in M so that the interfering 
effects caused by unwanted signatures can be suppressed by being mapped into a space 
orthogonal to the target signature subspace. The resulting projector is referred to as the 
signature subspace projection (SSP) classifier. Since the target signatures are those in 
which we are interested , a second approach is to project the observed pixels directly into 
a space that is linearly spanned only by the target signatures of interest rather than the 
entire target signatures in M. The resulting classifier is referred to as the target subspace 
projection (TSP) classifier. As we might expect, such TSP classifier should perform 
better than the SSP classifier in the sense that signatures other than the desired target 
signatures have been suppressed by the target subspace projection. Unfortunately, this is 
generally not the case. The drawback of the TSP classifier is that the undesired target 
signatures are not necessarily orthogonal to the desired target signatures, so the undesired 
target signatures may be scrambled into the signature space generated by the desired 
target signatures. As a result of such mixing, a bias becomes indispensable. In order to 
cope with this problem, a third approach based on the oblique subspace projection 
(Behrens and Scharf, 1994) is suggested to eliminate such a bias. It projects the desired 
target signatures and undesired target signatures into two separate orthogonal subspaces, 
range space and null space respectively. Since these two spaces are disjoint, no mixing 
will occur and nor will the bias. The resulting classifier will be referred to as an oblique 
subspace projection (OBSP) classifier. But, the OBSP classifier comes at a price. It is no 
longer an orthogonal classifier as are the SSP classifier and the TSP classifier. 
Nevertheless, it is still a projector. What it is interesting is that the SSP classifier can be 
actually decomposed into two oblique projectors, one of which happens to be the exactly 
the OBSP classifier. Moreover, we can also show that the OBSP classifier is identical to 
the unconstrained GML classifier derived in Settle (1996). In particular, the SSP 
classifier and the OBSP classifier are essentially equivalent in the sense of minimizing 
the least-squares error regardless of the fact that one is orthogonal and the other is not. 

In order to evaluate the performance of these three classifiers, we model the least- 
squares based abundance estimation problem as a signal detection problem where the true 
target signature abundance is the desired signal and the estimation error is a result of the 
noise. By virtue of this detection model, the effectiveness of each classifier can be 
evaluated using the receiver operating characteristics (ROC) analysis. Interestingly, both 
the SSP classifier and the OBSP classifier generate an identical ROC curve. This implies 
that they basically are the same classifier. Surprisingly, the OSP classifier turns out to be 
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the SSP classifier and the OBSP classifier with discarding the constant, k = (d^ . 

Since this constant affects the magnitude of the estimated abundance fractions of target 
signatures contained in a mixed pixel, the unconstrained GML classifier, the OSP 
classifier, the SSP classifier and the OBSP classifier can be considered as the same 
classifier with different classification powers. As for the TSP classifier, it produces a bias 
that deteriorates its performance. In order to resolve this problem, an unsupervised TSP 
classifier using the unsupervised algorithm, UVQ in Chapter 5 was developed in 
Brumbley and Chang (1999) where the bias was effectively eliminated by an 
orthogonalization algorithm developed in Chen et al. (1991). 



8.2 A POSTERIORI OSP 

In order to estimate a = directly from the image data, several 

techniques have been developed (Settle, 1996; Tu et al., 1997; Chang, 1998; Chang et 
al. 1998b) based on a posteriori information obtained directly from the image scene. In 
this case, the model described by (3.1) in Chapter 3 can be expressed in terms of the a 
posteriori abundance estimation and the a posteriori noise estimation and given by 



r = Md + n 

= da^ + Uy + n 



( 8 . 1 ) 



where d, and y are estimates of a, and y respectively based on the observed 
pixel vector, r. Accordingly, the model specified by (8.1) is called an a posteriori model 
as opposed to the model by (3.1), which can be viewed as a Bayes or a priori model. 
For simplicity, the dependency on r will be dropped from all the notations of estimates 
throughout the rest of this chapter. 

8.2.1 Signature Subspace Projection (SSP) Classifier 

Using the least-squares error as an optimal criterion, the optimal least-squares 
estimate of a, was given by (3.12) in Chapter 3. Substituting (3.12) for the estimate 
of a in model (8.1) results in 

r = Md,,+n^ (8.2) 

where 

= r - Md^ == M(a - dj^) + n. (8.3) 



From (3.12) we define a signature space orthogonal projector by = M(M^M)~’M^ 
which projects r into the signature space < M >. Applying P^ to model (8.2) yields 



= Md., 



(8.4) 

(8.5) 
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where = M and the term vanishes in (8.5) since annihilates n^g- 

Incorporating into the OSP detector given by (3.9) results in a signature 

subspace projection (SSP) vector, given by 

( 8 . 6 ) 

Now, if we apply to both a priori model (3.1) and a posteriori model (8.1), 
we obtain 



= dVX(Ma + n) 



= d^PfP,da +d>-‘Pn 



(8.7) 



and 



(w-fr = (w-f(M&^+n,3) 

( 8 . 8 ) 

= dV>„d«33,^ 

where the estimate of is denoted by Equating (8.7) and (8.8) gives rise to 

(8.9) 



Dividing (8.9) by d^P^,^d and using the equality of d^P^^Pj^d = d^P^^d result in 



«SSP., = a, «• 



Using (8.8) and (8.10) we can define a signature subspace classifier, 5ggp(r) by 
-5ssp(r) = = (d>;d)'d>>„r 



( 8 . 10 ) 



( 8 . 11 ) 



The estimation error resulting from (8.1 1) and (8.10) is given by 

£ssp.p = «SSP,P - «p = (d'T^fd) 

with SNRggp ^ defined by (3.6) or (3.10) as the maximum eigenvalue 



( 8 . 12 ) 



= (« (‘>'^„"d)"[d>„^P„df = (« 0-yd>>. (8.13) 



where & is assumed to be the variance of the noise n. 
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Two comments are noteworthy. 

1. Comparing (8.13) to (3.6), the maximum SNR (or eigenvalue) generated by 5osp(r) 
is exactly the same as that produced by <^ 3 gp(r) due to P^d = P^d . This 
shows that applying P^ does not increase the SNR. In other words, the maximum 
eigenvalue obtained by (3.6) remains unchanged after applying P^. This 
observation can be explained by Malinowski's error theory (Malinowski, 1977a) in 
which the eigenvalues are divided into two classes, a primary set containing larger 
eigenvalues and a secondary set of smaller eigenvalues. The former corresponds to 
imbedded errors that cannot be removed, whereas the latter represents experimental 
errors, which can be eliminated by designed techniques. According to this theory, 
the maximum eigenvalue is a primary eigenvalue, thus, it cannot be reduced by any 
estimation means. That (8.13) equals (3.6) illustrates this fact. A detailed error 
analysis resulting from the least-squares subspace approach was studied in Chang 
(1999). 

2. Note that the magnitude of the SNR is determined by the quantity d^ P^d in (8.13), 
namely, the degree of the target signature d correlated with the undesired signatures 
in U. So, if d is very similar to one or more signatures in U, d^P^^'^d (i.e. the 

projection of d onto < U >^) will be small. This implies that the SNR will be low. 
As a result, the difficulty with the target signature discrimination is increased. 
Therefore, the magnitude of d^P^d can be used as a measure of the discrimination 

power of the SSP classifier. The larger the d^/^'^d, the better the target 

discrimination. It should be also noted that the quantity of d^P^^^d is the inverse of 
the scale constant K that determines the accuracy of abundance estimation (Chang 
1998). This relationship further demonstrates how important role the K plays in the 
performance of detection and classification. 

8.2.2 Target Subspace Projection (TSP) Classifier 

In the SSP classifier, P^ projects r onto the entire signature space < M >. However, 
since we are only interested in classifying the target signature d, a natural approach is to 
project the observed pixel r onto the target signature space < d > rather than < M >. 
This results in a second classifier, called target subspace projection (TSP) classifier, 

denoted by J^ 3 p(r), given by 

d,,,ir) = d^P^^Pj (8.14) 

where P^ is defined in the same fashion that P^ was defined, i.e., P^ = d(d^d) d^. 

Unfortunately, as shown in Chang et al. (1998b), the TSP classifier created an 
additional bias, which requires the complete knowledge of target signatures in the 
undesired target signature matrix U. It was shown in Chang et al. (1998a) that the 
interference in hyperspectral imagery had significant impact on classification due to the 
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sensors' high spatial and spectral resolution. Such intereferers are also considered to be 
undesired target signatures. They are generally unknown and cannot be identified by prior 
knowledge. Therefore, a direct application of the TSP classifier may not be a good 
approach. In order to mitigate this problem, Brumbley and Chang (1999) used an 
unsupervised vector quantization-based (UVQ) algorithm presented in Section 5.2 of 
Chapter 5 to find undesired target signatures and interferers in an image scene. Then they 
further used an algorithm proposed in Chen et al. (1991) to orthogonalize these undesired 
target signatures and interferers with respect to the desired target signature the desired 
target signature prior to detection of d. As a consequence, there is no bias created by such 
undesired mixing. The resulting classifier is referred to as unsupervised vector 
quantization-based target subspace projection (UVQTSP) classifier given by 

<5„vQw(r) = d'0,s(UVQ)(r) (8.15) 

where is Chen's orthogonalization algorithm and the UVQ is the algorithm described 
in Section 5.2. 

8.2.3 Oblique Subspace Projection (OBSP) Classifier 

In the SSP classifier, the noise is first suppressed by then followed by 
elimination of the undesired signatures in U by the projector . It would be convenient 
if we could have both two operations done in one step. One such operator, called an 
oblique subspace projection, was developed in Behren and Schlarf (1994). Let < d > and 
< U > denote the range space and the null space respectively. In this case, the oblique 
subspace projection is no longer orthogonal. Let be a projector with its range space 
X and null space Y. Then can be decomposed as a sum of two oblique projectors and 
expressed by 

= + ( 8 - 16 ) 

where 

= d(d>>) 'dV; (8.17) 

(8.18) 

with Pj^d = d and = 0 . 

In analogy with (8.6) an oblique subspace projection (OBSP) vector, can be 

obtained via (8.17) by 

= A^E^t = (d^d)(d^P>)"dV>. (8.19) 



Applying (8.19) to model (3.1) and model (8.1) results in 
= d’^£,„r = d’^do:^ + d^£,„n 



( 8 . 20 ) 
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and 

+ d"£,> = d^d«„„,, ^ (8.21) 

where = 0, Equating (8.20) and (8.21) yields 

«0BSP„=« +(d'd)"dXn (8-22) 

and 

«0BSP,. = «p + (d^d) d^£,,n = «^ + (d^P>)"d^P> (8.23) 

So, from (8.22) and (8.23) we can define an oblique subspace classifier, (5Q3gp(r) , by 

-5oBsp(r) = (d^d)“(w°")'«- = (d'd)"d^£,„r = (dV„^d)"dV> (8.24) 
that produces the maximum SNR given by the maximum eigenvalue 

^0BsP.™,=(«<7''r[‘l'^u'd]". (8.25) 

Interestingly, equal to because (8.25) is identical to (8.13). 

Furthermore, the estimation error ^qbsp obtained from (8.23) as 

^OBSP.p = «0BSP.P - = (d^P,^d)-'d^P>. (8.26) 

8.2.4 Unconstrained Maximum Likelihood Estimation Classifier 

In the subspace projection approaches described in Sections 8.2. 1-8. 2. 3 the only 
assumption made on the noise n was additive and white with variance given by . 

If the noise n is further assumed to be Gaussian, p(r) in model (3.1) can be expressed as 

a Gaussian distribution with mean Ma and variance i.e., 

p{r) ~ A^(Ma,cr^l^^^). In this case, the least-squares estimate of a for model (8.1) can 
be obtained by the unconstrained Gaussian maximum likelihood (GML) classifier (Settle, 
1996; Chang et al., 1998b; Chang, 1998) which is given by 

^GML(r)= arg{max^p(r)) = (M'’M)"'M’'r. (8.27) 

In particular, the estimate of the /?-th abundance ^ is given by 
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= (d^P„^d)"dV„"(d« +n) (8.28) 

= a +(dV„^d)"d^P> 
and the associated estimation error is 

Wp = «0M.,p - = (dV„^d)'‘d^P>. (8.29) 

So, using (8.28) we can define an unconstrained GML classifier, 5aj^(r) by 

= (d>."d)"d>> = (8.30) 

which produces the unconstrained Gaussian maximum likelihood estimate of for the 
desired target signature d based on an image vector r. 

Comparing in (8.12) which is the error estimate resulting from 

the SSP classifier, it is different from the estimation error resulting from (5^j^(r), 

(d^ d^ P^n in (8.29). However, if we further compare (8.26) to (8.29), we discover 

that both equations are identical. This implies that the unconstrained GML classifier is 
indeed the OBSP classifier provided the noise is white Gaussian. In this case, the 
unconstrained GML classifier can be replaced with the OBSP classifier in mixed pixel 
classification. 

Several observations are also worth noting. Comparing (8.23) and (8.28) to the 

obtained in Settle (1996), the three abundance estimates (OBSP, unconstrained GML 
classifier, Settle’s unconstrained GML classifier) are identical. Nevertheless, Settle 
provided an alternative approach to derive the unconstrained GML classifier d^ (Settle, 
1996). This implies that when the GML classifier and OBSP classifier are applied to 
linear spectral mixing problems specified by (3.1), they both arrive at the same identical 
classifier. Now comparing (8.24) and (8.30) to (3.9), there is an extra constant, 

appearing in (8.24) and (8.30). This constant is actually the same scale 

constant discussed in Chapter 7 and results from abundance estimation error. Since 
model (3.1) assumes the complete knowledge of the abundance vector a, there is no need 
of estimating the abundance vector a in model (3.1). Consequently, the constant was 

absent from (3.9). This demonstrates the significance of the constant (d^F^^^d) that 

plays the distinction between the a priori model (3.1) and a posteriori model (8.2). 

More interestingly, if we examine (8.24) and (8.30), we can discover that the scale 

constant K = ^d^R^^^^dj appearing in CEM is exactly the same as the constant 

appearing in and only if in CEM is replaced by P^ . This implies 
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that the a posteriori information generated by S^^(r) is that approximates the 

prior information specified by d^F^'^d used in the two classifiers and 

This relationship provides evidence that the three classifiers (5^j^(r), ^qbsp^^) 
(^gmlCi*) are optimal least-squares linear classifiers in the sense that they all make full use 
of information, (i.e., a priori target information, d^F^^^d for ^^gspCr) and ^Gi^(r) a 
posteriori target information, d^R^’^^d for(5Ggj^(r)) to achieve best classification. The 
difference is that ^ sample spectral correlation-based matched filter, whereas 

^oBspC**) merely a pixel-level spectral matched filter. The advantage of 

former does not require the prior 
information of U as the latter does. When it comes to abundance estimation, the a 
/705'^enon OSP-based classifiers, and include k: = (d^R^J^^d) in their 

filter design to account for abundance estimation error. If we further examine (4.15), 

TCIMF 

specified by the filter vector is exactly identical to o^ggpCr) and 

^gml(^)- i^ ^ surprise since these three filters are designed based on the same 
least-squares error criterion and they all generate the same abundance estimation constant 

8.3 ESTIMATION ERROR EVALUATED BY ROC ANALYSIS 

In the previous section, three estimation errors were derived for the SSP classifier, 
the OBSP classifier and the MLE classifier. These errors are results of inaccurate 
abundance estimation of unknown signatures. In order to evaluate the error performance 
of these classifiers, we cast their associated estimation errors as a standard signal 
detection problem (Poor, 1994) where is viewed as a true target signature abundance 
fraction a ^ corrupted by the noise that is represented by their estimation errors . By 

virtue of this formulation, these three classifiers can be also interpreted as subpixel target 
detectors, which can be used to detect the presence of a target signature in a mixed pixel. 
The effectiveness of these detectors depends upon the accuracy of the abundance estimate, 

and can be evaluated by the receiver operating characteristic (ROC) analysis via the 
Neyman-Pearson detection theory. An ROC curve is a graph plotted by the detection 
power versus the false alarm probability. Instead of using ROC curves as a performance 
criterion, we define a measure, called detection rate (DR) which calculates the area under 
an ROC curve to evaluate the effectiveness of the detector. Obviously, DR always lies 
between one and 1/2. The worst case occurs when DR = 1/2, i.e., the detection power 
is equal to the false alarm probability that implies that the detector is useless. On the 
other hand, the best case occurs only when DR = 1 , namely, the detection power is 
always one regardless of the false alarm probability. Such ROC analysis has been widely 
used in diagnostic imaging (Metz, 1978; Swets and Pickett, 1982) for evaluation of 
computer-aided diagnostic methods where the detection power is measured by the true- 
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positive probability and the false alarm probability is represented by the false-positive 
probability. 

Assume that z is the projection resulting from applying a classifier 5(r) to the 
observed pixel r. A signal detection model based on z can be described by a test of two 
hypotheses as follows. 

//„ : z = (5(n) £ p„(z) 

versus (8-31) 

: z = a^+S(n)B p^iz) 



where the null hypothesis and the alternative hypothesis represent the case of 
noise alone, ^ = (5(n) (i.e., estimation error), and the case of the true target signature 
present in the z respectively. The Neyman-Pearson detector for (8.31) is given by 



fl; z = (5(n) > T 
NPD(z) = \ 

[0; z = ^(n) < T 


(8.32) 


From (8.32), we can also define the false alarm probability and detection power 
(detection probability) (Poor, 1994) as follows: 




(8.33) 




(8.34) 



Using (8.34), the detection power specifies the capability of the NPD in detecting the 
true target signature a^. Therefore, the higher the detection power, the smaller the 
estimation error, the better classification the classifier. 

8.3.1 Signature Subspace Projection (SSP) Classifier 

Substituting (5ggp(r) specified by (8.11) for in (8.33) results in a subpixel target 
detection model given by 

^0 : Z=^ssp(“) = "ssp =/^o(z) 

versus (8.35) 

: z = a^+n^^=p^{z) 

where the noise is generated by the estimation error produced by the SSP classifier. 
The hypothesis Hq in (8.33) represents the case that the mixed pixel does not contain the 
target signature d while indicating the presence of d in the mixed pixel. 

Based on (8.13)-(8.15), we obtain the error covariance matrix E^gp for the estimation 
error resulting from (8.14) as follows. 
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^SSP '^t^SSP^SSP^ '^t^SSP^SSP^ 



d F P P F d 

“ ^ U M M U 



d V F dd^F^F d d' F^F d 



=(dV>)‘'a" 



(8.36) 



Now, assume that n is a white Gaussian noise with zero mean and covariance matrix 
cr'I^^^. Substituting (8.36) into (8.35) yields 



p„(z) = /v(o.(dV„"d)''trj 
p,(z) = A'(a^,(dXd) '<7') 

and the threshold given by 



(8.37) 



(8.38) 



where 0(x) is the cumulative distribution function of the standard Gaussian random 
variable given by 



<I>(;c) = (l/2;r)''"Jle"’’''rf;. (8.39) 

The desired detection power can be derived using (8.39) as follows 

4p.o = 1 - - P,) - ) (8.40) 

= 1-4.(4.-(1-P,)-7I^). (8.41) 

In (8.40), Fggpj^ is expressed in terms of d^F^d which indicates the degree of the 
correlation between d and the projection F^|d. On the other hand, (8.41) illustrates that 

the detection power is measured by the magnitude of SNR or equivalently, the maximum 
eigenvalue. So, both (8.40) and (8.41) can be used to evaluate the performance of the 
SSP classifier <5ggp(r). 

It is important to notice that for a fixed false alarm probability F/r, (8.40) shows that 
the discrimination power of <5ggp(r) is proportional to the value of d^F^d, whereas 

(8.41) suggests that the discrimination power of ^ssp(^) proportional to the magnitude 
of the maximum SNR or . This makes perfect sense since the former indicates the 

SSP, max r 

degree of similarity between the target signature d and the undesired signatures in U, and 
the latter demonstrates that SNR determines the detection performance. More 
interestingly, if we interpret d^F^^d as an inner product of d and the projection F^d, 
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measures how much projection of d is projected onto, < U the orthogonal 

complement space. The more the projection, the less similarity between d and U, thus 
the better the discrimination. Additionally, the detection power given by (8.40) is also a 
function of a^. The higher the the better the detection of d. Despite the fact that 

is not present in (8.41), it is implicitly included in ^ given by (8.13). So, both 
(8.40) and (8.41) are also determined by the abundance fraction of a^. All of these 
relationships can be well explained by the ROC analysis. 

8.3.2 Oblique Subspace Projection (OBSP) Classifier 



Analogous to (8.35), the oblique classifier ^obsp(*') (8.24) generates the 

following subpixel target detection problem. 

^0 ■ ^ ~ “ ^OBSP ~ 

versus (8.42) 



with the noise covariance matrix S^ggp(r) given by 









: (d"P„"d)" = (dV„"d) '(T^ 



. (8.43) 



Substituting (8.43) into (8.42) we obtain the same probability density functions 
Pq{z) and p^{z) given by (8.37) for the detection problem described by (8.42). As a 
result, the threshold T^^gp is equal to the r^^p given by (8.38) and the desired detection 
power P^pgp ^ given by 

^0BSP.D = 1 - - Pp) - (8-44) 

(8.45) 



is also identical to the detection power P^^pd- implies that ^obsp^®*) essentially 
equivalent to 5ggp(r) in the sense that they both generate identical ROC curves. 



8.4 COMPUTER SIMULATIONS AND HYPERSPECTRAL IMAGE 
EXPERIMENTS 

In this section, computer simulations and the AVIRIS data in Fig. 1.5 will be used 
for performance evaluation. Since OSP does not use a posteriori target knowledge, its 
performance was not evaluated here. Furthermore, the OBSP classifier is essentially 
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equivalent to the SSP classifier in terms of ROC analysis, only experiments were 
conducted for the SSP classifier in this section. 

8.4.1 Computer Simulations 

In the following simulations, two laboratory data sets are used to form the target 
signature matrix is M = [mjm 2 m 3 ] with the abundance vector given by 
We also let d = m 3 be the target signature specified by abundance 
fraction and U = [mjm 2 ] be the matrix made up of undesired signatures with 
abundance fractions given by cc^,cc^. Data set 1 contains three field reflectances, dry 
grass, red soil and creosote leaves shown in Figure 1.5 with creosote leaves designated as 
d. Data set 2 consists of sagebrush, blackbrush and creosote leaves with U made up of 
sagebrush, blackbrush and d = creosote leaves. The difference between data set 1 and data 
set 2 is that the spectrum of the target signature in data set 2 is very similar to that of 
sagebrush, while all the three spectra in data set 1 are distinguishable. There are 50 mixed 
pixels simulated with abundances in accordance with Table 8.1. Here, we only consider 
the case that the undesired signature vectors share their abundances evenly for illustrative 
purpose. 



Table 8.1. SO^mulated^ixe^withassi^ne^^ 





Pixel no. 


Pixel no. 


Pixel no. 


Pixel no. 


Pixel no. 




1-10 


11-20 


21-30 


31-40 


41-50 


Target signature 


1% 


5% 


10% 


15% 


20% 


undesired signature 1 


49.5% 


48.5% 


45% 


42.5% 


40% 


Undesired signature 2 


49.5% 


48.5% 


45% 


42.5% 


40% 



For the case of uneven abundances in U, we refer to Zhao (1996) which shows no 
appreciable difference in the experiments, particularly for the desired signature with high 
abundance. In addition, different Gaussian noise levels are also simulated and added to 
generate SNR, 50; 1, 30:1 and 10:1 with the SNR defined in Harsanyi and Chang (1994) 
as 50% reflectance divided by the standard deviation of the noise. Figs. 8.1 and 8.2 show 
the performance of the SSP classifier data set 1 and 2 respectively where figures labeled 
by (a), (b) and (c) are results obtained based on SNR, 50:1, 30:1 and 10:1 respectively. 



SNR = 50 SNR = 30 SNR = 1 0 
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Figure 8.2. The performance of SSP for data set 2 



Comparing Fig. 8.2 to Fig. 8.1, the SNR and spectral similarity among signatures 
are significant factors of detection performance. 

In order to evaluate the estimation error performance of the SSP classifier, the ROC 
curves is plotted in Figs. 8.3 and 8.4 for data sets 1 and 2 with the same three signatures 
used for Figs. 8.1 and 8.2. Figures labeled by (a), (b) and (c) are results generated by 
0^3 / cr = 0.5, 0.3 and 0.1 respectively. 



SNR = 50 SNR = 30 SNR = 10 




Figure 8.3. ROC of SSP data set 1 

SNR = 50 SNR = 30 SNR = 1 0 




It should be noted that each ROC curve is generated by one mixed pixel with a 
given desired signature abundance to noise ratio (ANR), / <7 . For instance, if / <7 

= 0.5 and =0.3, then the noise level will be a = 0.6 and the other two undesired 
signatures will evenly split the remaining abundance 0.8. Their detection rates (DRs) are 
tabulated in Tables 8.2 for data set 1 and date set 2 respectively, which measure effects of 
different ANRs and spectral similarity. 





DR (data set 1) 


DR (data set 2) 


aj a ■= 0.5 


0.71 


0.57 


ajG = 0.3 


0.63 


0.54 


aj a =0.1 


0.55 


0.52 
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8.4.2 Hyperspectral Data 

A theoretical study on comparative analysis among subspace projection methods was 
investigated in Chang et al. (1998b). An experiment-based comparison among the OSP 
classifier, the OBSP classifier, UGMLC and the SSP classifier using the same AVIRIS 
image in Figure 1.6(a) was investigated in Chang et al. (1998b) and Chang and Ren 
(2000). In the following experiments, the standardized HYDICE data shown in Figure 
1.7(a) will be used for comparative analysis. Since both the OBSP classifier and GML 
classifier generate an identical estimation error given by (8.24) and (8.30), we will only 
focus our experiments on the OSP classifier, the OBSP classifier and the SSP classifier. 

It is interesting to note that if we apply a posteriori OSP, (d^P^d^ ^osp(^) model 
(3.2), it results in the same equations given by (8.26) and (8.31) with both 
^uGMLC;> replaced by a^. This implies that if the knowledge about the abundance 
vector cx is given a priori, then the OBSP classifier and UGMLC are reduced to OSP. 
On the other hand, if the abundance vector a is not known and needs to be estimated by 
d, then the OBSP classifier and GML classifier will be used to replace the OSP 
classifier. Consequently, OSP can be viewed as the a priori version of the OBSP 
classifier and GML classifier, while the OBSP classifier and GML classifier can be 
thought of as a posteriori OSP classifiers. Interestingly, despite the fact that the OSP 
used in Harsanyi and Chang (1994) was considered to be a priori OSP, the conducted 
experiments were actually based on a posteriori information where the target signatures, 

m,, used in (3.1) were directly extracted from the image scene and treated as 

if they were true target signatures. 

As noted, the constant accounts for the error of estimating the target 

abundance fractions present in classified pixels. This constant does not have impact on 
images displayed on computer because the abundance fractional images generated by 
OSP, the SSP and the OBSP classifiers for computer display are all scaled to 256 gray 

levels. In this case, the constant (d^F^d) is absorbed for computer display. So, from a 
display point of view, they all produce identical results as shown in Fig. 8.5(a-c). 
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Wigmm BS. CMsmfimkrn of OSP, SSP md OBSP 

In order to substantiate the difference between the abundance fractions generated by 
OSP, the SSP and the OBSP classifiers, we can take the absolute differences between 
these three classifiers and display their error images in 256 gray scales in Fig. 8.6(a-c). If 
two classifiers generate identical results, their absolute difference should be 0 and their 
corresponding error images should be all black. Obviously, this is not true as shown in 
Figure 8.6(a-c). This justifies the subtle difference among these three classifiers. 






^ i 
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If we further compare Figs. 8.6(b) and 8.6(c) to Fig. 8.6(a), the error images 
generated by Figs. 8.6(b) and 8.6(c) were very close, but very different from Fig. 8.6(a). 
This implies that OSP and the OBSP classifier performed nearly the same except that 
they did different amounts of target abundance fractions. On the other hand, the SSP 
classifier performed quite differently from OSP in that the SSP classifier includes an 

additional signature subspace projector and the constant in its classifier. 

As a result, the estimation error generated by the SSP classifier using (8.11) is very 
different from that by OSP using (3.9). This fact was reflected in Fig. 8.6(a). It was 
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shown in Tu et al. (1997) via ROC analysis that the SSP classifier greatly improved the 
OSP classifier in terms of SNR if the additive noise is assumed to be Gaussian. An error 
theory using ROC analysis for a posteriori OSP and OSP is further investigated in 
Chang (1999). 



8.5 CONCLUSIONS 

This chapter extends the orthogonal subspace projection (OSP) detector in Chapter 3 
to a family of unconstrained a posteriori OSP classifiers which perform abundance 
fraction estimation rather than abundance fraction detection as does OSP. Three such a 
posteriori OSP classifiers, SSP classifier, TSP classifier and OBSP classifier, were 
introduced, which are all based on a least-squares subspace projection approach. The SSP 
classifier estimates the target signature by projecting a mixed pixel into the signature 
subspace from which the target signature can be extracted by a matched filter. Rather than 
mapping a mixed pixel into the entire signature subspace, the TSP classifier projects the 
pixel directly into the target subspace space. Unfortunately, it does not produce 
satisfactory performance due to the fact that the undesired signatures are also projected 
and mixed into the target subspace space. As a consequence, an unknown signature bias 
is created. In order to eliminate this bias, two approaches are considered. One is the 
unsupervised vector quantization-based TSP classifier (Brumbley and Chang, 1999). 
Another is the OBSP classifier that projects the desired target signatures into its range 
space while eliminating the undesired target signatures by mapping them into its null 
space. As a consequence, the OBSP classifier is no longer an orthogonal projection 
classifier. Nevertheless, the experiments demonstrated that both the SSP classifier and 
the OBSP classifier performed almost the same in terms of target detection. In particular, 
they generate identical ROC curves. However, as will be demonstrated in Chapter 9, we 
should not draw the same conclusion on their ability in abundance estimation. It is also 
interesting to note that if the noise in (3.1) is assumed to be additive Gaussian noise, the 
OBSP classifier becomes an unconstrained GML classifier, which is actually an a priori 
estimator. In this case, the unconstrained GML classifier can be considered as a special 
case of the OBSP classifier, which is indeed an a posteriori estimator. 




9 



A QUANTITATIVE ANALYSIS OF MIXED-TO-PURE 

PIXEL CONVERSION (MPCV) 



Over the past years many algorithms have been developed for multispectral and 
hyperspectral image classification. Due to a lack of standardized data, these algorithms 
have not been rigorously compared within a unified framework. In this chapter, we 
present a comparative study of several popular classification algorithms through a 
standardized HYDICE data set with custom-designed detection and classification criteria. 
The algorithms to be considered for this study are those developed in Chapter 8, viz. the 
orthogonal subspace projection (OSP), unconstrained Gaussian maximum likelihood 
(GML) classifier, minimum distance, and Fisher's linear discriminant analysis (LDA). In 
order to compare mixed pixel classification (MPC) algorithms against pure pixel 
classification (PPC) algorithms, a mixed pixel is converted to a pure pixel via a mixed- 
to-pure pixel converter (MPCV). The standardized HYDICE data are then used to 
evaluate the performance of various PPC and MPC algorithms. Since the precise spatial 
locations of all the targets in the standardized HYDICE data are available, the candidate 
algorithms can be evaluated by tallying the number of targets detected and classified for 
quantitative analysis. 



9.1 INTRODUCTION 

Mixed pixel classification (MPC) becomes increasingly important in hyperspectral 
image analysis and provides an effective means for detection, classification, 
discrimination, quantification, etc. New algorithms have been reported every year. Due to 
a lack of standardized data, it is difficult to substantiate each of these algorithms. In 
addition, no unified criterion has been accepted for rigorous and impartial comparison. 
This chapter develops a set of custom-designed criteria for MPC and conducts a 
comparative study among several well-known classification algorithms. We confine our 
study to two types of classification, MPC and pure pixel classification (PPC) are 
considered. A general approach to MPC is to estimate the abundance fraction of a target 
signature of interest present in an image pixel and the estimated abundance fraction is 
then used to classify the pixel. One such method is spectral unmixing discussed in 
Chapter 8. However, the images generated by MPC are generally gray scale with gray 
level values representing the amounts of the estimated abundance fractions present in 
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pixels. Consequently, it requires visual interpretation of these gray scale abundance 
fractional images. Such human intervention is rather subjective and may not be reliable 
or repeatable. With no availability of standardized data or objective criteria a quantitative 
analysis for mixed pixel classification is nearly impossible. By contrast, PPC does not 
have such a problem. Unlike MPC it does not require estimation of signature abundance 
fractions for class-membership assignment. Its performance is completely determined on 
a pure pixel basis by criteria used for classification. So, two major issues are addressed in 
this chapter. One establishes a link between MPC and PPC via a mixed-to-pure pixel 
converter (MPCV). Another develops a set of acceptable criteria to conduct a comparative 
and quantitative analysis among commonly used MPC and PPC algorithms. The 
selection of candidate algorithms to be compared in this chapter is made for the 
following three major reasons. (1) As shown in Chapter 8, if the noise in a linear mixing 
problem is white Gaussian, the Gaussian maximum likelihood (GML) is essentially 
equivalent to OSP and identical to OBSP. So, this allows us to restrict MPC algorithms 
to a family of OSP-based classifiers. (2) Additionally, if we also assume that the sample 
spectral correlation/covariance matrix is stationary, the GML classifier is then further 
reduced to a minimum distance classifier. In this case, only minimum distance classifiers 
are considered. (3) Fisher’s linear discriminant analysis (LDA) has been shown very 
successful in pattern classification because of its criterion designed based on 
maximization of class separability. When PPC is considered, Fisher’s LDA must be 
included for study. The difference between OSP and the other approaches (i.e., minimum 
distance, Fisher’s LDA) is that OSP is primarily designed for MPC, whereas the latter is 
developed for PPC. Nevertheless, both can be used for MPC and PPC. As will be 
shown, MPC can be reinterpreted by imposing appropriate constraints on the abundance 
fractions and further reduced to PPC by a mixed-to-pure pixel conversion (MPCV). 
Through an MPCV a direct comparison between an MPC algorithm and a PPC 
algorithm is possible. 



9.2 CONVERSION OF MPC TO PPC 

Most MPC algorithms estimate the abundance fraction vector a = {a^ 
present in a pixel vector r using a linear mixture model given by (3.1). Since model 
specified by (3.1) is not used to estimate a for OSP, it is considered as a priori model. 
On the other hand, the a posteriori model described by (8.1) requires an estimation of a, 
which leads to an a posteriori OSP approach. In both cases, the images generated by 
MPC algorithms are referred to as abundance fractional images, which are gray scale. The 
gray level values of the abundance fractional images are specified by the abundance 
fractions of estimated for the desired target signature d present in each mixed pixel 
vector r in the images. The only difference between the a priori model and the a 
posteriori model lies on the fact that the abundance fractions generated by the a priori 
model do not reflect toie amounts of abundance fractions resident in the image pixel. 
Thus, these abundance fractions can be only used for detection and classification but not 
quantification. On the other hand, the images generated by the a posteriori model are 
estimates of true amounts of abundance fractions. Therefore, they can be better used for 
target classification and material quantification. 

As noted, MPC is usually carried out by visual interpretation based on the amounts 
of abundance fractions estimated form image pixels. So, technically speaking, OSP and a 
posteriori OSP perform more than classification. They are essentially signature 
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abundance estimation methods. In order for these algorithms to be used as classifiers 
without human interpretation, we need a mechanism, called a mixed-to-pure pixel 
converter (MPCV) that can threshold the estimate of an abundance fraction to a class- 
membership decision. This MPCV is very similar to an operator that has been widely 
used in communications and signal processing, called an analog-to-digital converter (A/D 
converter). Two versions of MPCV are suggested in this chapter, which are winner-take- 
all (WTA) and abundance percentage thresholding methods. The former is good for the 
case that the estimates of the abundance fractions of all the target signatures present in r 
are available, such as those produced by OSP in Chapters 3 and 8. The latter is generally 
applied to the case that only the estimates of target signatures of interest are available, 
e.g. those generated by LCMV developed in Chapter 4. 

9.2.1 Mixed-to-Pure Pixel Converter (MPCV) 

In order to compare PPC to MPC, we need to interpret an MPC problem in the 
context of PPC. One way to convert abundance estimation of a mixed pixel to 
classification of a pure pixel is to consider model (3.1) as a constrained problem with 
some specific restrictions imposed on the estimated abundance vector d . 

Assume that the abundance vector a in model (3.1) satisfies constraints, a. >0 for 

all 1 < 7 < p and cCj =1. Additionally, the estimate d is constrained to a set of p- 
dimensional vectors with one in only one component and zeros in the remaining p-\ 
components. Such vectors will be referred to as /^-dimensional unit vectors. Let u . be a 
/?-dimensional vector with 1 in its y-th component and O's in all other remaining 
components, i.e., = (O,---, 0,1,0, •• -,0)^, then u. is called the y-th /^-dimensional unit 

j 

vector. In this case, the estimated abundance vector d must be forced to a pure 
abundance vector and there are only p choices for d . In other words, d can be assigned 
to one and only one from p classes. By virtue of this constraint an MPC problem is 
reduced to a /?-class classification problem. As a result, it can be solved by PPC 
techniques. Imposing a constraint on /?- dimensional unit vectors the a posteriori model 
(8.1) becomes 



^^^(r) = Mu^ = m. for some \ < j<p (9.1) 

where Xmpcv(*') called a mixed-to-pure pixel converter (MPCV) operating on a pixel 
vector r that assigns r to signature for some y . It should be noted that the estimated 
noise n in model (8.1) has been absorbed into u^. to account for misclassification error. 

So, if we reinterpret model (3.1) using (9.1), each target signature vector in M represents 
a distinct class and an image pixel vector r will be assigned to one of the signatures in M 
via the MPCV specified by (9.1). The image resulting from MPCV is a binary image, 
which shows only target pixels. An important but difficult task is to design an effective 
MPCV for (9.1), which must preserve as much information as possible contained in 
mixed pixels during MPCV. 

One method was recently proposed in Ren (2000) which is based on winner-take-all 
(WTA) thresholding criterion. It is very similar to the winner-take-all learning algorithm 
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used in neural networks (Haykin, 1994). This WTA thresholding criterion can be used for 
(9.1) to convert a mixed pixel to a pure pixel. 

Assume that there are p target signatures {iRy}, ^ where m. is the y-th signature. Let 

r be a mixed pixel vector to be classified and a(r) = ••• be its 

estimated y?- dimensional abundance vector. Suppose that 6c. {r) is the estimated 
abundance fraction of m^. contained in r that is produced by an MFC classifier. We then 
compare all estimated abundance fractions |aj(r), cc^_{r), ^^(r)| and find the one 

with the maximum abundance fraction that will be assigned to r. Using such a WTA 
thresholding criterion (9.1) becomes 

;?:wTA^icv(r) = Mu, (9-2) 

with / = arg|maX|^^.^^|a^.(r)j|. In this case, a , (r) = l and a,(r) = 0 for j * j‘ ■ 
The MPCV defined by (9.2) is called WTAMPCV. As a result of the WTA criterion, the 
mixed abundance vector d is then converted to a pure abundance vector, the /-th p- 
dimensional unit vector u * = (0,---,0, l,0,---,0)^ with / = arg|max^^^^^|d^.(r)j| . 

j 

Another simple MPCV is to use the abundance percentage as a cut-off threshold 
value. If the estimated abundance fraction 6c^ of m^. goes beyond a certain percentage 

within an image pixel vector r, the r will be assigned to m^. . More precisely, assume 

that a% is the abundance percentage used as a desired cut-off threshold value. Like (9.2), 
we can use a% as a thresholding value to define an a% MPCV via (9.1) by 

= Mu^ = if a.(r) >ax lO'’-. (9.3) 

Using (9.3), the mixed abundance vector d is converted to a pure abundance vector, 
which is a /^-dimensional unit vector u^ . As a consequence, a mixed pixel resulting from 

i 2 %MPCV may be classified into multiple target classes if two or more estimated 
abundance fractions exceed the threshold value a%. This is not the case for WTAMPCV, 
which assigns one and only one /»-dimensional unit vector to r. Nevertheless, 

aVoMVCV has an advantage over WTAMPCV in the sense that WTAMPCV requires the 
complete target knowledge of an image scene. So, WTAMPCV is suitable for linear 
unmixing methods such as the OSP-based classifiers. On the contrary, a%MPCV does 
not need full knowledge of target information. Therefore, it is good for detectors and 
classifiers such as the ones discussed in Chapter 4 where a%MPCV was used to produce 
Tables 4. 1-4.4. However, there is also a disadvantage of <3%MPCV, which is the 
selection of an appropriate percentage threshold value to make a%MPCV effective. 

9.2.2 Minimum Distance-Based Classification 

In Section 9.2.1, we described WTAMPCV that directly converted abundance 
estimation of a mixed pixel to classification of a pure pixel. In the following two 
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subsections, we use (9.1) as a vehicle to reinterpret two commonly used pure pixel 
classification methods, minimum distance-based classification and Fisher's linear 
discriminant analysis, in the context of constrained MFC. 

As noted in (9.1), there is no noise term present in the equation. This is because the 
noise is already included as part of a misclassiflcation error. As shown in Chapter 8, 
when the noise in model (3.1) was assume to be white Gaussian, the mixed pixel 
classifiers, OSP and a posteriori OSP, became unconstrained Gaussian maximum 
likelihood (GML) classifier, namely, 

= arg{max.^^ p(r)} . (9.4) 

In this case, it should be noted that the A in (9.4) is confined to the set 
^ j {( 1 » 05 ''*» 0 ) ,(0,l,--',0) (0,0, •■•,!) I that contains only p unit 

vectors, i.e., A = |d = (a,,---,o:^)^| d. =1 for some y, d. = 0 for i ^ y|. In other 

words, the estimated abundance vector d in (9.4) must be a y?-dimensional unit vector. . 
Since there are p components, only p options are in A for d^j^. If we further assume that 

the sample covariance matrix used in d^^ is constant for all sample pixel vectors (i.e., 
the sample covariance matrix is independent of sample vectors), the classifier using (9.4) 
is then simplified to a classifier that is based on the minimum distance of r from class 

means j . As a result, an image pixel vector r is assigned to m * according to 

;fGML(r) = Mu., =m.,. (9.5) 

where f = arg{min,^.^ || r - m. ||}. 

More generally, assume that x = (:Tj,-'‘,x^)^is an image pixel vector to be classified 
in a hyperspectral image. Let {co^ be a set of classes of interest and co. be the 

class represented by the y-th signature m^. = . In our case, classes 

are target classes of interest and m^. is the y-th target signature that belongs to class co . . 

Assume that is the fth sample vector in class co and S = ' is the set of 

sample vectors needed to be classified where N. is the number of sample vectors in class 
CO. and N = A, +•••+ is the total number of sample vectors. Two types of distance- 
based classifiers can be considered depending upon the used sample statistics. 

1) The first-order statistics classifier: 

Minimum distance classifier: 

(a) Euclidean distance 



ED(x,m^) = ..jix-m.Yix-mj) = 



(9.6) 
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Since the quadratic term in x of (9.6) is independent of class co., the Euclidean 
distance-based minimum distance classifier is a linear classifier. 

(b) City block distance: 

CBD(x,mp = - m.J (9.7) 

(c) Tchebyshev (maximum) distance (TD): 

TD(x,mp =maxj<;<^|x; (9.8) 



2) Second-order statistics classifiers: 

(a) Mahalanobis distance classifier (Fukunaga, 1991) 

MD(x,m^.) = (x (x ^ mp (9.9) 

In general, the Mahalanobis distance classifier is a quadratic classifier. However, 
when E . = is the same for all classes, then the Mahalanobis classifier is 

reduced to the Euclidean distance classifier. 

(b) Bhattacharyya distance classifier (Fukunaga, 1991) 



1 

B = -(m - m 



)'((V + 2.)/2y 



(m - m ) 



1 

+ — In 
2 

V 



(z, +2j)/2^ 



(9.10) 



In a special case that E. = I,, for classes 0). and co., then the Bhattacharyya 
distance classifier is reduced to the Mahalanobis distance classifier. 

In case the covariance matrices E in (9.8) and (9.9) are not of full rank, their inverses 
will be replaced by their pseudo-inverses, E^ = ■ 

9.2.3 Fisher’s Linear Discriminant Analysis (LDA) 

Let and be ^/-dimensional and d -dimensional vector spaces with d usually 
less than d, d < d where and R‘‘ are considered to be the original data space and 
feature space respectively. Let denote c classes of interest and 

= |x^,X 2 ,-’-,x^ I = |x^ be the y-th class containing N. samples where 
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is a <i-dimensional vector in the space and the z-th sample in 

class 0).. Let be the total number of training samples. From Fisher's 

discriminant analysis (Duda and Hart, 1973), we can form total, between-class and 
within-class scatter matrices as follows. 

Let |i = (l / be the global mean and the mean of 

class CO. . Then 



-n)(x; -ia/ 


(9.11) 




(9.12) 




(9.13) 


From(9.11)-(9.13), S, =S^ +S 3 . 

Assume that S = is a set of data training samples. 

Fisher's discriminant analysis finds a dxd weight matrix W = [wj w, that 

projects the z-th sample x^ in class co. in R‘‘ onto which is in a low dimensional 

feature space R'^ in such a manner that the resulting projected data samples {y^ },^'i y_, 
yield the best possible class separability by the following matrix transformation 


II 


(9.14) 


where 




= w[x;' for 1 < ^ < ^ . 


(9.15) 


and is the A:-th column vector with dimensionality 

yi = (y!ryL-^y^y- 

Using (9.11)-(9.14) we can define similar within-class and 
matrices for the projected samples y' = hy 


li X 1 in W and 
between-class scatter 


= sp, S"', (y. - fijXy. - fiff 


(9.16) 




(9.17) 



where fl = (l / y' and ji , = (l / y' ■ 

Substituting (9.14) and (9-15) into (9.1 1)-(9.13) results in 
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s„ = w^s^w 


(9.18) 


S3 = wXw. 


(9.19) 


In order to find an optimal linear transformation matrix W of size dxd in terms of 
maximizing class separability, we use Fisher's discriminant function ratio, called Raleigh 
quotient which is the ratio of the between-class scatter matrix to the within-class scatter 
matrix given by 


S, W"S3W 

./(W)-|/| = , 1 

K w^s„w 


(9.20) 


where |-| is the determinant of a matrix. The optimal solution to (9.20), denoted by 


11 


(9.21) 


can be found by solving following generalized eigenvalue problem 




^b'^1 = , for l<k<d. 


(9.22) 



with corresponding to the eigenvalue generated by (9.22). These d eigenvectors, 

j obtained from (9.22) form a set of Fisher's linear discriminants to yield a new 
set of feature vectors 



y; = (w'f x| for 1 < A: < rf. (9.23) 

where y' = (wj) x' is given by (9.15). 

Since there are c classes, only c-1 eigenvalues, denoted by are nonzeros 

where each eigenvalue X^ produces its own eigenvector w^. . By means of these 
eigenvectors we can define a Fisher's discriminant analysis-based optimal linear 

transformation (p* via (9.14), (9.20)-(9.23) by 

y = f(x) = (w*)"x (9.24) 

where the size of W* is dx{c -1) with the A:-th column specified by the k-th 
eigenvector w* . The feature vector derived by (9.14) is a feature representation of 
that will be used for classification. It should be noted that there is no direct relationship 
between d, the dimensionality of the feature space R‘’ and c, the number of classes. 
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However, it is often the case that d <c . For more details, we refer to Duda and Hart 
(1973). 

According to the linear mixture model described by (3.1), there are p target 
signatures to be classified. Consequently, there are only p-\ nonzero eigenvalues. 
Assume that > A, >0 are arranged in decreasing order of magnitude. 

Their corresponding eigenvectors resulting from (9.22) are called Fisher's 

discriminant vectors. For instance, w* corresponding to is the first Fisher's 
discriminant vector, wj corresponding to X^_ is the second Fisher's discriminant vector, 
etc. Using these p-\ Fisher's discriminant vectors we can construct an eigen- 

matrix W' given by W* = [w* w* to project a pixel vector x onto its feature 

vector y = Wx in a new space Z that is linearly spanned by . The Fisher LDA 

classification is then carried out in the space Z using a classification measure such as the 
minimum distance measures given by (9.5)-(9.9). 

9.2.4 Unsupervised Classification 

Although the distance-based classifiers described above are supervised and require a 
set of training samples, they can be extended to unsupervised classifiers by including a 
clustering process such as the nearest neighboring rule (NNR) described in Section 5.2 or 
a neural network-based self organization algorithm (Haykin, 1994). For example, 
ISODATA is has been widely used in pattern classification (Duda and Hart, 1973), which 
implements the minimum distance classifier in conjunction with an NNR clustering 
process. 



9.3 CRITERIA FOR TARGET DETECTION AND CLASSIFICATION 

In this section, we develop a set of custom-designed criteria for target detection and 
classification. Using the 15-panel HYDICE scene in Fig. 1.7(a) as an example, a typical 
masked target of size 4 x 4 is shown in Fig. 9.1 where an R pixel (i.e., a red pixel in 
Fig. 1.7(b)) is considered to be a target center pixel and a Y pixel (i.e., a yellow pixel in 
Fig. 1.7(b)) surrounding R pixels is a target pixel that may be either a target boundary 
pixel or a target pixel mixed with background pixels. With this ground truth map given 
in Fig. 1.7(b) we can actually tally panels pixels that are detected and classified by a 
specific algorithm. 
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Figure 9.1. A typical masked target 
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Here, we make a subtle distinction between a target detected and a target hit. When a 
target is detected, it means that at least one R target pixel is detected. When a target is 
hit, it means that at least either one R or one Y pixel is detected. As long as one of these 
R or Y pixels is detected, we declare that the target is hit. So, by way of this definition, 
a detected target always implies a hit target, but not vice versa. 

The criteria that we develop in this chapter are (1) how many target R pixels are 
detected; (2) how many target Y pixels are detected; (3) how many pixels are detected as 
false alarms for a target in which case neither an R pixel or a Y pixel is detected; (4) how 

many target R pixels are missed. For example, suppose that the shaded pixels in Fig. 1 

are those detected by a detection algorithm. We declare that the target is detected with 
one R pixel as well as hit with one R and two Y pixels. There are no false alarm pixels, 
but have three R pixels that are missed by detection. In order to quantitatively study 
target detection performance, the following definitions are introduced. Let 

N = total number of sample pixel vectors 
t = specific target to be detected 

Nr+yW = total number of R plus Y pixels specified by t 

NrCI) = total number of R pixels specified by t 

Ny(1) = total number of Y pixels specified by t 

N(R+Y)o(t) = total number of either R or Y pixels detected as t 

NruCI) = total number of R pixels detected as t 

NYD(t) = total number of Y pixels detected as t 

NpA(t) = total number of false alarms pixels, i.e., total number of pixels, which are 
neither R nor Y pixels, detected as t 

N^Ct) = Nr^y(0 - = total number of R and Y pixels that are specified by t 

missed 

Using the above notations, we can further define the R-detection rate R^(t) for target 

tby 



RRD(t) = N„,(t)/N,(t) (9.25) 

and the Y-detection rate RydC*) for target t by 

Rvo(t) = NY,(t)/NY(t). (9.26) 

Since R pixels represent target center pixels and Y pixels are target boundary pixels or 
pixels mixed with background pixels, a good detection algorithm must have a higher 
target R-detection rate, RroCI). On the other hand, detecting a Y pixel does not necessarily 
imply a target detected. Nevertheless, in this case, target t can be declared to be hit. For 
this purpose, we define the target hit rate RhCO for target t by 

RnCt) = N,„.v,o(t)/NR.v(t). (9.27) 

So, from (9.27) a higher target hit rate RH(t) may not mean a higher target R-detection 
rate RRo(t), or vice versa. This is because the number of Y pixels are generally much 
greater than the number of R pixels. In most cases, Y pixels may even determine the 
performance of Rn(t)- As will be shown in the experiments, a detection algorithm may 
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detect all R pixels but miss all Y pixels. In this case, this algorithm achieves 100% 
target R-detection rate, RRo(t) = 1 but target Y- detection rate, RyoW = 0. As a result, its 
target hit rate RhCO is very small because RydCO = 0. On the other hand, if the target hit 
rate RnCt) = 1 , it implies that all the R and Y pixels are detected. In this case, even a 
target is hit, we may still have difficulty with locating where the target is. So, the target 
R-detection rate, RrdCI) is more important than RhCO since it provides the information 
about the exact location of the target t. 

In addition to (9.25)-(9.27), we are also interested in target false alarm rate Rfa(*) and 
target miss rate R^Ct). They can be defined as follows. 



RfA(t) = N,,(t)/(N-N„.,(t)) (9.28) 

R„(t) = 1 - R„(t) = N„(t)/N,,,(t) = (N,,,(t) - N„,,)„(t))/N,,v(t) (9.29) 

If there are p targets, T = In be classified, the overall detection rate Rod(T) for a 

class of targets, T can be defined as 

RoD(T) = SL,p(t,)R^(t,) (9.30) 

where 

/7(t,) = N(t,) / IIMK) for 1 < / < p. (9.31) 

As will be seen in the following experiments, a higher Rqd(T) does not imply higher 
classification accuracy because it may happen that several targets are detected in one 
single image due to their similar signature spectra, but may be difficult to discriminate 
one from another. As a result, it may yield poor classification. In order to account for 
this phenomenon we define the classification rate for a specific target , Rc( ) as 

Rc(t,) = N^(t J / (N,(tJ + N,,(tj) (9.32) 

and the overall classification rate as 



Roc(T) = SL,P(t,)Rc(t.) (9.33) 

where />(tj and R^( ) are defined by (9.31) and (9.32) respectively. Now using (9.32)- 
(9.40) as criteria, we can evaluate detection and classification performance of various 
algorithms. 



9.4 COMPARATIVE PERFORMANCE ANALYSIS 

This section presents a thorough quantitative and comparative analysis among MPC 
and PPC algorithms described in this chapter. In order to validate such a study, the 15- 
panel HYDICE image scene in Fig. 1.7(a) is used for experiments. The spatial positions 
of these 1 5 target panel pixels are precisely located at pixel resolution by the ground truth 
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map in Fig. 1.7(b) where the panel pixels are designated as either center pixels or 
masking pixels. A center pixel is generally considered as a pure pixel, while a masking 
pixel is referred to as a pixel, which may be a boundary pixel or a panel pixel mixed with 
background signatures. Using this data scene along with the designed criteria given in 
Section 9.3 a comparative analysis on classification accuracy becomes possible. The 
significance of this chapter is to offer a rigorous and objective means to evaluate the 
performance of each candidate algorithm on the same common ground. The target 
signature matrix M used for the 1 5 panels is formed by the five spectral signatures, P 1 , 
P2, P3, P4 and P5 in Fig. 1.8. 

Example 9.1 (MPC comparative analysis) 

In this example, an experiment-based quantitative study is conducted among three 
MPC classifiers, OSP, SSP and OBSP. In Chapter 7, it has shown that the GML 
classifier was identical to the OBSP classifier (see (7.26) and (7.31)). In this case, only 
the OBSP classifier was included for experiments. Fig. 9.2(a-c) shows the classification 
results of the 15 panels produced by OSP, SSP and OBSP respectively. 




Ft^uri^ 9.2. CbiitytlcaFon re&ults OSP, SSP and DBS? 

The above experimental results may lead to a conclusion that the three classifiers are 
essentially the same classifier. Unfortunately, this is not case as shown in the following 
example. 

Example 9.2 

In this example, 250 mixed pixels were simulated according to Table 9.1 using the 
five panel signatures in Fig. 1.8. 
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Table 9.1. Abundance^actio^assi^^ 



Pixel # 


PI 


P2 


P3 


P4 


P5 


Tree 


Grass 


1-10 


10% 


0% 


0% 


0% 


0% 


45 % 


45% 


11-20 


20% 


0% 


0% 


0% 


0% 


40% 


40% 


21-30 


40% 


0% 


0% 


0% 


0% 


30% 


30% 


31 -40 


60% 


0% 


0 % 


0% 


0% 


20% 


20% 


41-50 


80% 


0% 


0% 


0% 


0% 


10% 


10% 


51-60 


0% 


10% 


0% 


0% 


0% 


45 % 


45% 


61 -70 


0% 


20% 


0% 


0% 


0% 


40% 


40% 


71-80 


0% 


40% 


0% 


0% 


0% 


30% 


30% 


81-90 


0% 


60% 


0% 


0% 


0% 


20% 


20% 


91 - 100 


0% 


80% 


0% 


0% 


0% 


10% 


10% 


101 - 110 


0% 


0% 


10% 


0% 


0% 


45 % 


45% 


111 - 120 


0% 


0% 


20% 


0% 


0% 


40 % 


40% 


121-30 


0 % 


0% 


40% 


0% 


0% 


30% 


30% 


131 -40 


0% 


0% 


60% 


0% 


0% 


20% 


20% 


141-50 


0% 


0% 


80% 


0% 


0% 


10 % 


10% 


151-60 


0% 


0% 


0% 


10% 


0% 


45 % 


45% 


161-70 


0% 


0% 


0% 


20 % 


0 % 


40 % 


40% 


171 - 180 


0% 


0% 


0% 


40% 


0% 


30% 


30% 


181-190 


0% 


0% 


0% 


60% 


0% 


20% 


20% 


191-200 


0% 


0% 


0% 


80% 


0% 


10 % 


10% 


201-210 


0 % 


0% 


0% 


0% 


10% 


45 % 


45% 


211-220 


0% 


0% 


0% 


0% 


20% 


40% 


40% 


221 -230 


0% 


0% 


0% 


0% 


40% 


30% 


30% 


231 -240 


0% 


0% 


0% 


0% 


60% 


20% 


20% 


241 -250 


0% 


0% 


0% 


0% 


80% 


10% 


10% 



As an example, the first 50 pixels were simulated by the panel signature PI with the 
pixel numbers 1-10 containing 10% PI signature, pixel numbers 11-20 containing 20% 
PI signature, pixel numbers 21-30 containing 40% PI signature, 31-40 containing 60% 
PI signature and pixel numbers 41-50 containing 80% PI signature. In order to make the 
abundance fractions sum to one, we added two background signatures, tree and grass 
extracted from the scene in Fig. 1.7(a), which evenly split the remaining abundance. For 
instance, the first 10 pixels contained 10% PI signature plus 45% tree and 45% grass, 
while the last 10 pixels (i.e., pixel numbers 241-250) contained 80% P5 signature, 10% 
tree and 10% grass. Additionally, a Gaussian noise was also added to achieve a 30:1 
signal-to-noise ratio as defined in Harsanyi and Chang (1994). The resulting 250 
simulated pixels are shown in Fig. 9.3. 



Simulated Pixels at Band 35 




The classification results of OSP, SSP and OBSP classifiers are shown in the first, 
third and fourth columns of Fig. 9.4. 
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Example 9.3 

In the previous two examples, comparisons were made among OSP-based classifiers. 
The abundance fractional images shown in Fig. 9.2(a-c) are gray scale and can be only 
interpreted by visual inspection. In order to avoid human intervention, these abundance 
fractional images must be thresholded by an automatic method. WTAMPCV and 
fl%MPCV described by (9.2) and (9.3) can be used for this purpose. Using the criteria 
defined in Section 9.3 we tallied results by applying WTAMPCV and a%MPCV to the 
abundance fractional images in Fig. 9.2(a-c). Tables 9.2(a-b) tabulate detection and 
classification rates resulting from WTAMPCV for all the three classifiers, OSP, SSP and 
OBSP respectively. It should be noted that SSP and OBSP produced the same results in 
Table 9.2(b) despite that they differ by implementing a signature subspace projector 
in SSP given by (8.12), which is absent in (8.27). 



Table 9 .2(a). Results of OSP resultin g from WTAMPCV 





N, 


Nv 


N,,v 


Nh. 


Nvo 


N.,.v,o 


K. 


N„ 


Rrd 


R^ 


K 


R.. 


K 


PI 


3 


47 


50 


3 


27 


30 


1213 


20 


1.00 


0.58 


0.60 


0.30 


0.40 


P2 


4 


41 


45 


4 


8 


12 


1010 


33 


1.00 


0.20 


0.27 


0.25 


0.73 


P3 


4 


37 


41 


4 


32 


36 


1042 


5 


1.00 


0.87 


0.88 


0.26 


0.12 


P4 


4 


40 


44 


4 


10 


14 


348 


30 


1.00 


0.25 


0.32 


0.09 


0.68 


P5 


4 


39 


43 


4 


3 


7 


384 


36 


1.00 


0.08 


0.16 


0.10 


0.84 




19 


204 


223 


19 


80 


99 


3997 


124 


1.00 


0.39 


0.44 


0.20 


0.56 






Table 9 , 


.2(b). Results of SSP and OBSP resulting from WTAMPCV 








N, 


Nv 


N,.v 




Nv. 


N,.v,0 


N,, 


N, 


Rffi 


Rvt. 


Rh 


K. 


Rm 


PI 


3 


47 


50 


3 


19 


22 


1136 


28 


1.00 


0.40 


0.44 


0.28 


0.56 


P2 


4 


41 


45 


4 


11 


15 


1210 


30 


1.00 


0.27 


0.33 


0.30 


0.67 


P3 


4 


37 


41 


4 


16 


20 


351 


21 


1.00 


0.43 


0.49 


0.09 


0.51 


P4 


4 


40 


44 


4 


18 


22 


778 


22 


1.00 


0.45 


0.50 


0.20 


0.50 


P5 


4 


39 


43 


3 


7 


10 


532 


33 


0.75 


0.18 


0.23 


0.14 


0.77 




19 


204 


223 


18 


71 


89 


4007 


134 


0.95 


0.35 


0.40 


0.20 





As a comparison, we also applied a%MPCV to the abundance fractional images in 
Fig. 9.2(a-c). Tables 9. 3-9.4 are produced by 50%MPCV and 25%MPCV respectively. 



Table 9.3. Results of OSP, SSP and OBSP resulting from 50%MPCV 





N, 


Nv 


N,.v 


N^ 


Nvo 


N,r.v)D 


N,, 


Nm 


Rrd 


Rvd 


Rh 


Rr. 


R. 


PI 


3 


47 


50 


0 


0 


0 


76 


50 


0.00 


0.00 


0.00 


0.02 


1.00 


P2 


4 


41 


45 


3 


2 


5 


352 


40 


0.75 


0.05 


0.11 


0.09 


0.89 


P3 


4 


37 


41 


3 


2 


5 


0 


36 


0.75 


0.05 


0.12 


0.00 


0.88 


P4 


4 


40 


44 


3 


3 


6 


74 


38 


0.75 


0.08 


0.14 


0.02 


0.86 


P5 


4 


39 


43 


3 


2 


5 


2 


38 


0.75 


0.05 


0.12 


0.00 


0.88 




19 


204 


223 


12 


9 


21 


504 


202 


0.63 


0.05 


0.10 


0.03 


0.90 



Table 9.4. Results of OSP, SSP and OBSP resulting from 25%MPCV 





Nh 


Nv 


N,,v 


N^ 


Nvo 


N fR*Y,D 


Np, 


N„ 


Rrd 


Rvd 


Rh 


Rr. 


Rm 


PI 


3 


47 


50 


0 


0 


0 


267 


50 


0.00 


0.00 


0.00 


0.07 


1.00 


P2 


4 


41 


45 


4 


6 


10 


799 


35 


1.00 


0.15 


0.22 


0.20 


0.78 


P3 


4 


37 


41 


4 


4 


8 


145 


33 


1.00 


0.11 


0.20 


0.04 


0.80 


P4 


4 


40 


44 


4 


19 


23 


552 


21 


1.00 


0.48 


0.52 


0.14 


0.48 


P5 


4 


39 


43 


3 


4 


7 


273 


36 


0.75 


0.10 


0.16 


0.07 


0.84 




19 


204 


223 


15 


33 


48 


2036 


175 


0.79 


0.18 


0.23 


0.11 


0.77 
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Interestingly, unlike WTAMPCV, all the three classifiers, OSP, SSP and OBP 
produced the same results for each case. These experiments imply that classification 
results can be altered by a different MPCV. 

Since the a% used in a%MPCV can go from 100% down to 0%, we can actually 
compute an ROC curve for a%MPCV. However, in order for an ROC curve to be 
meaningful, the sample pool must be large to produce reliable statistics. In the 15-panel 
HYDICE image scene, there are only a total of 19 target R center pixels shown in the 
scene. It does not make lots of sense to plot an ROC only based on these 19 R pixels. In 
this case, tallying the detection results as we did Tables 9.2-9.4 is not an effective means 
to show detection performance. As an alternative, we replace the target detection 
probability by the target hit probability defined by (9.27) where there is a sufficient 
number of Y pixels (a total of 204 in the image scene) to generate a reasonable ROC 
curve. As a result, each a% generates a target hit probability and a false alarm 
probability. Then a 3-dimensional (3-D) ROC curve can be plotted based the generated 
target hit probability versus its false alarm probability via a%MPCV where the x, y axes 
are specified by false alarm probability and abundance percentage % with the target hit 
probability represented by z-axis. Because our experiments showed that three classifiers, 
OSP, SSP and OBSP generated the same ROC curve, Fig. 9.5(a) only shows the 3-D 
curve of the 15-panel HYDICE image scene produced by SSP using a%MPCV. Fig. 
9.5(b) is its corresponding 2-D ROC curve plotted by the target hit probability versus the 
false alarm probability. As shown in Fig. 9.5, the results were not good compared to the 
results obtained in Fig. 11.5 for other classifiers, which will be discussed in Chapter 1 1 . 





Figure 9.5. ROC curve generated by a%MPCV using SSP for the 15 -panel HYDICE scene 

It is interesting to compare the performance of MPCV to that of PPC. In the 
following experiments we implemented the PPC classifiers described in Section 9.3 for 
comparison. Since each target panel contains no more than 4 R pixels whose number is 
far less than the number of bands. Supervised second-order minimum distance-based 
classifiers specified by (9.8) and (9.9) are generally not applicable because the used 
covariance matrices will be ill-ranked. Under this circumstance, the number of target R 
pixels was used as the rank to calculate the covariance matrix. In order to further simplify 
experiments, only ED and MD were used for comparison because they are representatives 
of the first-order and second-order minimum distance-based classifiers. Tables 9. 5 -9. 8 
tally the detection and classification results produced by ED, LDAED (EDA using ED), 
MD and LDAMD (EDA using MD) respectively. From these four tables, EDAED 
performed the best among the four distance-based pure pixel classifiers. The reason why 
MD-based classifiers did not perform well was that there were no sufficient target 
samples of which they could take advantage for sample spectral correlation. In our 
experiments, the lack of target samples caused a difficulty of calculating the sample 
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covariance matrix since it was not of full rank. One way to alleviate this problem was 
suggested in Ren (1998) where a new set of extra samples could be generated by 
nonlinear functions such as auto-correlation, cross-correlation from the original target 
pixels. 



^Table^^9^5j^^Resultsoif]ED^^^ 





N, 


Nv 






Nv. 


N 

-tN (r+y)D 


Np, 


Nm 


RflD 


Ryd 


R„ 


R,. 


Rm 


PI 


3 


47 


50 


2 


2 


4 


120 


46 


0.67 


0.04 


0.08 


0.03 


0.92 


P2 


4 


41 


45 


3 


11 


14 


376 


31 


0.75 


0.27 


0.31 


0.09 


0.69 


P3 


4 


37 


41 


3 


2 


5 


0 


36 


0.75 


0.05 


0.12 


0.00 


0.88 


P4 


4 


40 


44 


0 


2 


2 


67 


42 


0.00 


0.05 


0.05 


0.02 


0.95 


P5 


4 


39 


43 


3 


1 


4 


65 


39 


0.75 


0.03 


0.09 


0.02 


0.91 




19 


204 


223 


11 


18 


29 


628 


194 


0.58 


0.09 


0.13 


0.03 


0.87 



Table 9.6. Results of LDA ED for the 15-panel HYDICE scene in Fig. 1.7(b) 





N, 


Ny 






Ny. 




Np, 


N, 


Rrd 


Rv. 


R« 


R.. 


K 


PI 


3 


47 


50 


3 


8 


11 


1 


39 


1.00 


0.17 


0.22 


0.00 


0.78 


P2 


4 


41 


45 


4 


3 


7 


0 


38 


1.00 


0.07 


0.16 


0.00 


0.84 


P3 


4 


37 


41 


4 


4 


8 


0 


33 


1.00 


0.11 


0.20 


0.00 


0.80 


P4 


4 


40 


44 


4 


4 


8 


0 


36 


1.00 


0.10 


0.18 


0.00 


0.82 


P5 


4 


39 


43 


4 


5 


9 


0 


34 


1.00 


0.13 


0.21 


0.00 


0.79 




19 


204 


223 


19 


24 


43 


1 


180 


1.00 


0.11 


0.19 


0.00 


0.81 



J[^ble^^9^7^__Results^ofMDJbr&e^^ 





N, . 


Nv 


N,.y 


N,, 


Nyo 


N(R.y)D 


Np, 


N, 


R*d 


Rx, 


Rh 


R.. 


Rm 


PI 


3 


47 


50 


0 


0 


0 


10 


50 


0.00 


giTiiai 


0.00 


0.00 


1.00 


P2 


4 


41 


45 


1 


1 


2 


78 


43 


0.25 


0.02 


0.04 


0.02 


0.96 


P3 


4 


37 


41 


3 


10 


13 


464 


28 


0.75 


0.27 


0.32 


0.11 


0.68 


P4 


4 


40 


44 


1 


0 


1 


7 


43 


0.25 


0.00 


0.02 


0.00 


0.98 


P5 


4 


39 


43 


2 


1 


3 


2 


40 


0.50 


0.03 


0.07 


0.00 


0.93 




19 


204 


223 


7 


12 


19 


561 


204 


0.37 


0.07 


0.09 


0.03 


0.91 






Table 9.9. 


Results of LDAMD for the 15 


-panel HYDICE scene in 1.7(b) 








N. 


Ny 


N,*y 




Nyo 


N 

(R+y)D 


Np, 


Nm 


Rrd 


Rto 


Rh 


R.. 


Rm 


PI 


3 


47 


50 


1 


1 


2 


0 


48 


0.33 


0.02 


0.04 


0.00 


0.96 


P2 


4 


41 


45 


3 


2 


5 


0 


HqI 


0.75 




0.11 


0.00 


0.89 


P3 


4 


37 


41 


4 


2 


6 


0 


35 






■ilM 


■ixm 


0.85 


P4 


4 


40 


44 


2 


2 


4 


0 






0.05 


0.09 




0.91 


P5 


4 


39 


43 


2 


2 


4 


0 


39 






EES 




0.91 




19 


204 


223 


12 


9 


21 


0 


202 






0.10 


mEM 


0.90 



If we further compare the results of LDAED to that of SSP, it found that LDAED 
performed better than SSP in terms of R-pixel detection but worse than SSP in Y- 
detection rate as well as target hit rate. However, both produced very close false alarm 
and target miss rates. More examples and experiments can be also found in Chang and 
Ren (2000). 



9.5 CONCLUSIONS 

Many hyperspectral target detection and image classification algorithms have been 
proposed in the literature. Comparing one to another is very challenging due to a lack of 
standardized data. Another difficulty also arises from the fact that there are no rigorous 
criteria to substantiate an algorithm. This chapter reinterprets MPC from a view point of 
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PPC by imposing constraints on target signature abundance fractions. As a result, the 
classes of target detection and classification algorithms can be reduced to three categories: 
OSP-based mixed pixel classifiers, minimum distance-based pure pixel classifiers and 
Fisher's LDA-based classifier. In addition, two MPCV are developed to convert an MPC 
problem into a PPC problem so that conventional PPC techniques could be readily 
applied. According to our experiments, MPCV may perform better than the minimum 
distance-based PPC. It unfortunately does not work as well as the class separability-based 
LDA due to the loss of gray level information about abundance fractions resulting from 
MPCV. The information, provided by the abundance-based gray scale images generated 
by MPC algorithms, contains very useful visual features, which can substantially 
improve image interpretation of classification results. The PPC algorithms cannot obtain 
such gray level information. Although we have made effort to make comparative analysis 
as comprehensive and rigorous as possible, completion is not claimed. In particular, the 
WTA-based converter used in this chapter for tallying target pixels is a simple 
thresholding technique and may not necessarily be optimal. There may exist an effective 
MPCV, which can produce better pure pixel classification performance. A thresholding 
technique using probability distribution in Chapter 16 and a Neyman-Pearson detection 
theory-based thresholding method in Chapter 17 are also promising approaches to 
MPCV. Finally, it should be noted that all the algorithms considered in this chapter are 
unconstrained in the sense that no constraints are imposed on signature abundances. 
Investigation of constrained mixed pixel classification problems will be discussed in 
Chapter 10. 




IV 



CONSTRAINED MIXED PIXEL CLASSIFICATION 



The unconstrained mixed pixel classification (unconstrained MFC) discussed in 
PART III estimates abundance fractions of target signatures present in a pixel vector via a 
linear mixture model without imposing any constraint on the target signatures. The 
estimated abundance fractions are then used to classify an image pixel into one of the 
targets. Like subpixel detection, the performance of unconstrained MFC can be further 
improved by imposing constraints on target signatures or target abundance fractions. So, 
in PART IV two approaches to constrained MFC will be investigated, target abundance- 
constrained MFC (TACMPC) and target signature-constrained MFC (TSCMPC). 
Chapter 10 presents two least-squares based techniques, a fully constrained least-squares 
(FCLS) MFC and modified FCLS (MFCLS). Both extend the partially constrained least 
squares methods, SCLS and NCLS developed for subpixel detection in Chapter 3. One 
most significant and important benefit resulting from fully constrained MFC is mixed 
pixel quantification, a task that unconstrained MFC cannot achieve. The techniques in 
Chapter 10 provide solutions to target quantification at subpixel level. As mentioned in 
PART II, the vector direction may be as an important feature as abundance fractions to 
characterize a vector. In some cases such as target detection, target abundance fractions 
may not be of main concern. Instead, the directions of target signature vectors may 
provide more crucial information than target abundance fractions in image analysis. This 
was demonstrated in Chapter 4. So, in contrast to TACMPC, Chapters 11 and 12 
consider TSCMPC, which constrains the vector directions of target signatures rather than 
abundance fractions. As noted in spectral angle mapper (SAM), two pixel vectors 
pointing to similar directions may result in similar spectral properties where the 
magnitude of a pixel is determined by its vector length, which represent a certain amount 
of abundance fraction contained in the pixel vector. Chapter 11 extends the LCMV 
subpixel detectors, CEM and TCIMF in Chapter 4 to various LCMV classifiers where 
the feasibility of real-time implementation is also discussed. Chapter 12 develops another 
new approach, linearly constrained discriminant analysis (LCDA) that is derived from 
Fisher's linear discriminant analysis considered in Chapter 9. It constrains the directions 
of targets of interest aligned with predetermined coordinates so that the targets to be 
classified can be separated in a way that we desire. The derived LCDA classifier turns out 
to be a constrained version of Harsanyi-Chang's OSP classifier in Chapter 8. Recently, a 
filter- vectors (FV) method was developed by Bowles et al. (Bowles et al., 1995) for 
mixed pixel classification. Surprisingly, Bowles et al.'s FV method can be shown to be 
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a special version of LCMV and LCD A in Chapters 11-12. In particular, like OSP which 
can be considered as a whitened CEM, the FV method can be interpreted as a whitened 
background-removed LCMV (BRLCMV) classifier . 




10 



TARGET ABUNDANCE-CONSTRAINED MIXED PIXEL 

CLASSIFICATION (TACMPC) 



The mixed pixel classification (MFC), which was considered in Chapters 8 and 9 is 
unconstrained with no constraint imposed on the target signature abundance fractions. 
Consequently, the resulting abundance estimates do not necessarily reflect their true 
amounts of abundance. In this case, these estimates can be only used for the purpose of 
target detection, discrimination and classification, but not for target quantification. In 
order for MFC to perform mixed pixel quantification, we need to consider a fully 
constrained mixed pixel classification problem, which imposes two constraints on the 
abundance fractions of target signatures. They were described in Chapter 3, (a) abundance 
sum-to-one constraint, refenred to as ASC, cCj = 1 and (b) abundance nonnegativity 
constraint, a. >0 for all l< j < p, referred to as ANC. Since no closed form can be 

derived for a fully constrained mixed pixel classification problem, two least-squares 
approaches are presented in this chapter. One is referred to as fully constrained least- 
squares method (FCLS), which develops an efficient algorithm to yield least-squares 
optimal solutions. Another is referred to as modified fully constrained least-squares 
method (MFCLS), which applies the ASC, but replaces ANC with the absolute 

abundance sum-to-one constraint (AASC), i.e., = 1, that allows one to derive an 

analytical form for optimal solutions. Both approaches arrive at nearly the same results. 
Additionally, both FCLS and MFCLS can be further extended to unsupervised methods 
by incorporating unsupervised algorithms presented in Chapter 5. 



10.1 INTRODUCTION 

Spectral mixture analysis (SMA) has been widely used in remote sensing for 
versatile applications such as target discrimination, detection and classification. A wealth 
of SMA techniques has been reported in the literature. For the purpose of mathematical 
tractability, most SMA-based techniques are developed based on a mixture model, which 
assumes to be linear and unconstrained, for example, OSF in Chapter 3 and a posteriori 
OSF, Gaussian maximum likelihood classifiers in Chapters 8 and 9. Unfortunately, such 
unconstrained methods generally do not take full advantage of a linear mixture model 
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that can be implemented by ASC and ANC to improve solutions, particularly, to provide 
solutions to mixed pixel quantification problems. On the other hand, a linear constrained 
mixed pixel classification problem involves solving a set of inequality constraints, which 
usually does not have analytic solutions. In this case, we must rely on numerical 
algorithms to render optimal solutions. Over the past years some efforts were devoted to 
solving fully constrained linear mixing problems; however, they were designed mainly 
for a small number of target signatures. For instance, in Shimabukuro and Smith (1991) 
they considered several constrained least-squares mixing models and obtained constrained 
least-squares solutions by solving an overdetermined system that consisted of n 
equations with n unknowns and n<m where m is the number of spectral channels and n 
is the number of target signatures. Since there are no closed-form solutions, one must 
exhaust all possible solutions in a feasible region bounded by ASC and ANC. The use of 
quadratic programming techniques to impose the ASC and ANC was previously 
investigated (Shimabukuro 1987, Boardman, 1990, Settle and Drake, 1993), but these 
methods are computationally expensive. Another method presented in Ashton and 
Schaum (1998) also suffered from excessive computational complexity as the number of 
target signatures increases. 

In this chapter, we consider a fully constrained linear mixing problem for mixed 
pixel quantification in hyperspectral imagery. Because there are no analytical solutions, 
two least-squares-based approaches are developed. One is FCLS, which makes use of an 
efficient numerical algorithm to generate optimal solutions. The proposed FCLS method 
is based on a least-squares approach (Scharf, 1991) in conjunction with OSP in Chapter 
3. A second method is MFCLS, which modifies the fully constrained linear mixing 

problem by replacing ANC, a. > 0 for all \ < j < p with AASC, = 1 while 

still implementing ASC, = 1. The advantage of the proposed MFCLS method is 

to convert a set of inequality constraints to an equality constraint so that a closed-form 
solution can be found through Largrange multipliers. Both approaches take advantage of 
its SCLS solution derived in Chapter 3 and use it as a vehicle to bridge the gap between 
unconstrained and fully constrained solutions. As shown in Section 3.3, the SCLS 
solution can be obtained by an unconstrained least-squares solution plus an error 
correction term. The unconstrained least-squares solution in the SCLS solution is 
identical to the least-squares orthogonal subspace projection solution specified by (3.9) in 
Chapter 3. From here MFCLS solves the FCLS problem with an additional constraint, 
AASC using Largrange multipliers. Unlike MFCLS, FCLS implements the NCLS 
algorithm developed in Section 3.3 coupled with the SCLS solution. It is also a 
quadratic programming technique, but the algorithm used to simultaneously 
implementing both ASC and ANC is more computationally efficient. The significant 
saving in computational cost becomes more evident when the FCLS method is extended 
to an unsupervised FCLS method. 

In order to further extend FCLS and MFCLS to unsupervised methods, the three 
unsupervised methods described in Chapter 5 can be used for this purpose. Of particular 
interest is the unsupervised FCLS method (UFCLS), which can be implemented by 
FCLS along with the unsupervised NCLS algorithm described in Section 5.4. One major 
advantage resulting from the developed UFCLS is its ability in identifying potential pure 
pixels in an unknown image scene. These pixels are then used to generate more accurate 
target signature information for supervised mixed pixel classification. 
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10.2 FULLY CONSTRAINED LEAST-SQUARES APPROACH 

One simple approach to solving fully constrained linear mixing problems is to take 
advantage of partially constrained solutions. For an SCLS solution, we can simply throw 
out the target signatures with negative abundance fractions and normalize the abundance 
fractions of the remaining target signatures to unity. The resulting solution is called 
normalized SCLS (NSCLS) solution. For an NCLS solution we can normalize it to 
unity, which results in the normalized NCLS (NNCLS) solution. Unfortunately, as will 
be shown in the experiments, neither NSCLS nor NNCLS will yield optimal solutions 
since the two constraints, ANC and the ASC are carried out sequentially, not 
simultaneously. A method that simultaneously implemented ANC and ASC was recently 
proposed in Heinz et al. (1999). However, it still produced only a nearly optimal 
solution because it did not satisfy (3.24). In what follows, we will present an FCLS 
algorithm that will produce accurate abundance fractions of target signatures in a linear 
mixture of a mixed pixel. The proposed FCLS method is the same one considered in 
Haskell and Hanson (1981). It extended the nonnegative least-squares algorithm in 
Lawson and Hanson (1995) by including ASC. 

10.2.1 Fully Constrained Least-Squares Method (FCLS) 

In order to take care of ASC, we include ASC in the signature matrix M by 
introducing a new signature matrix, denoted by N, which is defined by 



N = 



SM 



( 10 . 1 ) 



with 1 = (1 1 • • • 1)^ , and a vector s by 



s = 



Sr 

1 



( 10 . 2 ) 



The utilization of <5 in (10.1)-(10.2) controls the effect of ASC. Using these two 
equations an FCLS algorithm can be derived directly from the NCLS algorithm 
described in Section 5.4 by replacing M and r used in the NCLS algorithm with N and s 
respectively. 

10.2.2 Unsupervised FCLS Method (UFCLS) 

The FCLS requires a complete knowledge of the signature matrix M. In order for it 
to apply to a situation where no a priori information is available, we need an 
unsupervised process to generate the required material information to apply FCLS. Since 
FCLS is a least-squares-based approach, the least-squares error (LSE)-based criterion 
described in Section 5.4 is used for the optimal criterion and the UNCLS algorithm is 
used to generate the desired target (material) information for the linear mixture model. 
This procedure is referred to as Unsupervised FCLS (UFCLS) algorithm, which can be 
summarized as follows. 
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Unsupervised FCLS (UFCLS) Algorithm 

1) Initial condition: 

Select 8 to be a prescribed error threshold and let = arg|max^[r^r]j where r is 
run over all image pixel vectors. Let k = 0. 

2) Find m, that yields the largest LSE^^'^Cr) = (r - mj^(r - m i.e., 

= arg|max^LSE^^\r)J. 

3) Let k <r- k + 1 and apply the FCLS algorithm with the signature matrix 







'(5M' 




(5r 




M = |m„ 


N = 




and s = 




to estimate the abundance fractions 


L 0 1 ^ J ’ 


V 




1 _ 





ofm„, m^, 

4) Find the least-squares error defined by 

LSE“’(r) = (r - [ELo«“’(r)m,]) (10.3) 

and check the error if LSE^^^r) < e for all r. If it is, the algorithm stops, otherwise 
continue. 

5. Find = arg{max^LSE^^^(r)j. Go to step 3. 

It should be noted that {LSE^^\r)|^ is a monotonically decreasing sequence at k, 

thus, it converges. Similarly, such an LSE-based unsupervised approach can be also used 
to extend LS specified by (3.12), SCLS, NSCLS, NCLS and NNCLS to Unsupervised 
LS, Unsupervised SCLS, Unsupervised NSCLS, Unsupervised NCLS and Unsupervised 
NNCLS. 



10.3 MODIFIED FULLY CONSTRAINED LEAST-SQUARES (MFCLS) 
APPROACH 

As mentioned previously, the main difficulty with solving constrained linear mixing 
problems is the constraint of abundance nonnegativity that prohibits one from using the 
Lagrange multiplier method to find solutions analytically. In this section, we propose an 
alternative approach. Instead of directly dealing with the inequalities, a. > 0 for each 
1 < 7 < p , we replace them with the absolute abundance sum-to-one constraint (AASC), 
~ advantage of AASC is that the Lagrange multiplier method is now 

applicable and can be used to derive an iterative algorithm that leads to a desired optimal 
constrained least-squares solution. Additionally, AASC enables us to exclude negative 
abundance fractions from solutions. In other words, the only solution to satisfy both 

constraints, sum-to-one = 1) and AASC = 1) is that all the abundance 

fractions must be nonnegative. So, a modified least-squares linear mixing 
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problem with constraints, ASC, =1 and AASC, = 1 can be cast as 

follows 



{(*■ “ Ma)(r - Ma)’'} 



(10.4) 



subject to 



. = |a| = 1 and = l| 



(10.5) 



Using a similar argument that was used to derive PCLS solutions in Chapter 3, an 
object function can be obtained by 

J(a) = (1 / 2)(r - Mct)(r - Ma)’' - A,(e;„ a. - l) - - l). (10.6) 

Differentiating (10.6) with respect to a and setting to zero yields 



57(a)/axU =0 



- Ijl - 

- (m"m)"‘[A,1 + A^sign{a)] 



(10.7) 



where M^r is the unconstrained least-squares estimate of a given by 

(3.12). We then substitute for a in (10.7) using the following constraints 



=fa = l 


(10.8) 


D'.ijaJ = sign(af a = 1 


(10.9) 



to compute and The sign(a) in (10.9) is defined by the sign function of a with 
the y-th component being the sign of a.. More precisely, sign(a) = is 

defined by 



a. / ; if a. 0 

P^ = \ ^ ^ 

[O; if = 0 



( 10 . 10 ) 



Iteratively computing A,, A, and d^^^ using (10.7)-(10.9) can obtain a solution to 
(10.4)-(10.5). The detail of implementing this algorithm is given below. 

MFCLS Algorithm 

1. Initialization: Set dg ^3 using (3.17) 
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2. Compute and using (10.8)-(10.9) 

3. Compute - (m"m) '[A,1 + A,sign{a)] 

4. If there exists one component in which is negative, go to step 2. Otherwise, 

stop. 

It should be noted that step 1 of the MFCLS algorithm is initialized by taking 
advantage of the SCLS solution ^^cis obtained by (3.17). The stopping criterion 
described in step 4 provides a rule of when the algorithm will be terminated, in which 
case all components must be nonnegative. In general, it requires a fair amount of 
computing time to reach this requirement. So, in real implementation, we suggest a 

simpler rule by checking for a preselected threshold 8 that will 

guarantee a quick termination. 



10.4 COMPUTER SIMULATIONS AND REAL HYPERSPECTRAL IMAGE 
EXPERIMENTS 

This section contains a series of computer simulations and experiments to evaluate 
comparative performance of LS, SCLS, NSCLS, NCLS, NNCLS, FCLS and their 
corresponding unsupervised versions. First, we conducted computer simulations to 
demonstrate advantages of FCLS and UFCLS. Then real hyperspectral image data were 
used to show superior performance of FCLS and UFCLS in comparison to other 
methods. In implementing FCLS and UFCLS, the value of 6 used in (17) and (18) was 
fixed at 5 = 1.0 X 10~^ except for HYDICE experiments where various 5 values were 
explored for performance analysis. 

10.4.1 Computer Simulations 

In the following simulations, two experiments were designed to demonstrate the 
performance of the LS, SCLS, NSCLS, NCLS, NNCLS, FCLS and MFCLS methods, 
(1) when the information of all signatures is completely known and (2) when some false 
information is used. 

Example 10.1 Signature Matrix with Three Distinct Target Signatures 

r "l 400 

The same set of 400 simulated pixels, |r, /. ^ used in Example 7.1 was used for 

experiments. They were simulated as follows. We started the first pixel vector with 100% 
red soil and 0% dry grass, then began to increase 0.25% dry grass and decrease 0.25% red 
soil every pixel vector until the 400th pixel vector which contained 100% dry grass. We 
then added creosote leaves to pixel vector numbers 198-202 at abundance fractions 10% 
while reducing the abundance of red soil and dry grass accordingly. For example, after 
addition of creosote leaves, the resulting pixel vector 200 contained 10% creosote leaves, 
45% red soil and 45% dry grass. White Gaussian noise was also added to each pixel 
vectors to achieve a 30:1 signal-to-noise ratio. Fig. 10.1 shows the results of the LS, 
SCLS, NSCLS, NCLS, NNCLS, FCLS and MFCLS methods in detection of creosote 
leaves where all six methods performed similarly. 
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Figure 10.1. Classification results of LS, SCLS, NSCLS, NCLS, NNCLS, FCLS and MFCLS 

In order to further compare their capabilities at quantifying creosote leaves, the 
squared errors between the actual and estimated abundance fractions of creosote leaves 
were averaged over 400 pixels. The resulting quantification errors for each method were 

2.588x10'' forLS, 7.850x10^ for SCLS, 3.717 xlO"* for NSCLS, 1.318x10'' for 

NCLS, 1.303 xlO'' for NNCLS, 3.715 x 10^ for both FCLS and MFCLS. Obviously, 
FCLS produced the best quantification result in the sense of minimizing the average of 
squared errors. It is interesting to note that NSCLS, FCLS and MFCLS performed 
nearly the same. However, this will not be true in following simulation. 

Example 10.2 Effects of Two Additional, Less Spectrally Distinct, Targets 

The target signature matrix M was made up of the spectral signatures of all targets in 
the image scene. An image pixel may not be necessary to contain all these targets and 
may contain only a few of them. In order to demonstrate the effects of some targets used 
in M but absent in a pixel, the same simulated mixed pixel vectors used in Example 
10.1. However, two additional signatures, blackbrush and sagebrush were added to the 
target signature matrix M. These signatures were not actually present in the pixel vectors, 
that is, the abundance fractions of blackbrush and sagebrush in these 400 simulated 

pixels are 0%. In this scenario, the signature matrix M = [m^ m 3 m^] was assume 
to consist of these five spectral signatures with abundance fractions given by 
a = (a^a^a^a^a^y . Fig. 10.2 shows the results of LS, SCLS, NSCLS, NCLS, 
NNCLS, FCLS and MFCLS in detection of creosote leaves. 
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Figure 10.2. Classification results of LS, SCLS, NSCLS, NCLS, NNCLS, FCLS and MFCLS where M 
consisted of five endmembers: dry grass, red soil, creosote leaves, sagebrush and blackbrush 



Unlike Fig. 10.1 the performance of LS, SCLS and NSCLS was very poor and their 
respective averaged squared quantification errors, 2.556 x 10~^ for LS, 1.945 x 10"^ for 
the SCLS, 7.960 x lO""* for NSCLS were significantly worse than those produced by the 

other three methods, 4.823x10"' for NCLS, 4.907 x 10"’ for NNCLS and 2.806x10"’ 
for FCLS. The detection performance of the LS was considerably decreased because the 

undesired signature annihilator used in LS nulled the undesired signatures, 
blackbrush and sagebrush whose spectra are similar to that of creosote leaves. The 
performance of SCLS and NSCLS was reduced because they assumed there were five 
signatures and their estimated abundance fractions must be summed to one. Since the 
spectrum of creosote leaves is close to those of blackbrush and sagebrush, the estimated 
abundance jfraction of creosote leaves was forced to share with the inexistent blackbrush 
and sagebrush. As expected, SCLS and NSCLS would not perform well. By contrast, 
ANC significantly improved the performance of NCLS, NNCLS and FCLS. By 
implementing ANC these methods were able to effectively select an appropriate subset of 
material signatures for unmixing. This experiment demonstrated that NCLS, NNCLS 
and FCLS performed significantly better than LS, SCLS and NSCLS as the number and 
similarity of spectral signatures in the signature matrix M increases. In both examples, 
FCLS performed the best while LS was the worst. 

10.4.2 AVIRIS Image Experiments 

The data used in the following experiments are AVIRIS image in Fig. 1.6(a). For 
these experiments, representative pixels of each signature were manually extracted from 
the image scene and their average was used to represent the signature. For example, the 
signature for the play a was obtained by averaging 5033 pixels located at the bottom right 
comer of the image scene. The shade signature was generated by averaging pixels in a 
5x5 square located in the darkest area of the scene. Each of the used signatures for 
cinders and rhyolite was produced by averaging four pixels in the scene. One pixel was 
extracted for vegetation signature. Three experiments were conducted using different 
degrees of prior information applied to signatures. 

Example 10.3 Complete Prior Signature Information 

With these five signatures manually selected as above, a signature matrix, M was 
formed. Fig. 10.3 shows the quantification results of LS, SCLS, NSCLS, NCLS, 
NNCLS, FCLS and MFCLS using cinders, playa, rhyolite, vegetation and shade as the 
desired signatures respectively. 
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The estimated abundance fraction values for each signature correspond directly to the 
gray scale values for each image in Fig. 10.3(a-f). Underneath each of the images 
generated by LS, SCLS, NSCLS, NCLS and NNCLS the ranges of abundance fraction 
values for their images are particularly specified since they are not really fully constrained 
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methods. As we can see, from a detection point of view all the six methods performed 
similarly. However, in terms of quantification the performances of NSCLS, NCLS, 
NNCLS, FCLS and MFCLS are very close and better than LS and SCLS. 

Example 10.4 Partial Prior Signature Information 

In this experiment, only one of five signatures was assumed to be known a priori. 
In this case, the UFCLS algorithm was used to generate additional signatures to form a 
desired signature matrix M to quantify this particular signature. Fig. 10.4(a) shows the 
results of cinders after 16 iterations of the UFCLS algorithm. Similarly, Fig. 10.4(b)-(e) 
shows the results of playa, rhyolite, vegetation and shade after 3, 19, 25 and 5 iterations 
respectively. 




(a) mniMn i b) I'U ■> ^ 'A'A i (4) i 

Fig^srs 104. Results of UFCLS for (s) cus,dor$v (b) phyK (c) rbyoliie> (d) vegeieticm sod (e) shade with, 
partial knowledge, where for each imoge ouiy signature of iuletest was kucwo a prUm. 

Compared to Fig. 10.3, their results were very similar to those produced by NCLS, 
NNCLS, FCLS and MFCLS. This implies that the UFCLS using a single signature of 
interest performed as well as FCLS and MFCLS using all the five signatures. 

Example 10.5 No Prior Signature Information 

The following experiment is interesting and designed to explore the utility of 
UFCLS algorithm when no prior information is assumed. In order to initialize the 
algorithm, a prescribed error threshold e and an initial material signature, were 

required. To determine a value for e we used the UFCLS algorithm and plotted the 
maximum least-squares error values resulting from (10.3), as shown in Fig. 10.5. 



X 10 ^ 
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Figure 10.5. Plot of LSE values 
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Since the plot leveled off at the iteration k = 5, a threshold value of 5 x 10 '* was 
selected for e. For the initial material signature a pixel with the maximum vector 

length, i.e., a brightest pixel in the image scene was selected and turned out to be a pixel 
located in the center of the dry playa dry lake. Using the spectral signature of this pixel 

as the initial signature, LSE^°^(r) was calculated for all pixels r in the image scene 
and the resulting image is shown in Fig. 10.6(a). The spectral signature of a shade image 
pixel that yielded the largest LSE in Fig. 10.6(a) was selected as the first signature nij. 
At this point, and represented two pixels that corresponded to the brightest and 
darkest pixels in the image scene. The UFCLS algorithm continued to calculate 
LSE^*^(r) as specified in step 4 and the result is shown in Fig. 10.6(b). A pixel 
containing cinders that produced the largest LSE in Fig. 10.6(b) was selected and its 
spectral signature was used as the second signature m^. Then following the same 

procedure the LSE^^^(r) image is shown in Fig. 10.6(c) where a pixel at the edge of the 
dry lakebed had the largest LSE and its spectral signature was selected as the third 
desired signature 11 I 3 . Once again, using m^, m^, m, and m 3 LSE^^^r) image was 
generated in Fig. 10.6(d) and the spectral signature of a vegetation pixel that generated 
the largest LSE was selected as the fourth desired signature m^. Interestingly, Figs. 
10 . 6 (c) and (d) are almost identical except for the pixel at the edge of the dry lakebed 
represented by m 3 . In this particular situation, UFCLS found an anomalous signature 
m 3 that could be not be identified by visual inspection from the image scene. 
Accordingly, UFCLS can be used to detect anomalies. This task cannot be achieved by 
the supervised FCLS method. Then with m^, mj, m^, m 3 and m^ LSE^'*^(r) was 
generated in Fig. 10.6(e) where a rhyolite pixel produced the largest LSE and its spectral 
signature was designated as the fifth signature, m^. Since the largest LSE in the 

LSE^^^r) image labeled by Fig. 10.6(f) was below the prescribed threshold £ = 5 x 10\ 
the UFCLS algorithm was terminated. At this point, the error image in Fig. 10.6(f) took 
on a chi-square distribution, except for a few pixels along the edge of the dry lakebed and 
a pixel in the upper left comer of the image. This chi-square distribution provided 

conformation that the six signatures |m.| generated by the UFCLS algorithm 

L J J j=o 

represented the significant materials in the image scene. 




(m) (c) in) it) 

!(K4. Pixel detectioj^ of UFCLS algoritinn 



The six signatures that were found in Fig. 10.6 were used to form a signature 
matrix, M to implement LS, SCLS, NSCLS, NCLS, NNCLS, FCLS and MFCLS. The 
results are shown in Fig. 10.7 where only the range of estimated abundance fractions for 
LS and SCLS were specified underneath the images they generated. This is because the 
abundance fractions estimated by all the other five classifiers were all in the range [ 0 , 1 ]. 
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Fsgtsre 10,7v of LS, SCLS, NSCLS^ NCLS> KKCLS, FCLS aod MFCLS 



Compared to the images in Fig. 10.3, the results in Fig. 10.7 were similar, but 
quantification results were worse, particularly for playa and shade. This is primarily due 
to the fact that only a single pixel was used to generate each of signatures in M. When a 
relatively large area such as dry lake and shade needs to be classified, a single pixel can 
not well represent the area to account for spectral variability. One way to resolve this 
dilemma is to add more sample pixels to generate a more robust signature. These 
additional sample pixels can be selected by using spectral measures such as, ED, SAM or 
SID discussed in Chapter 2. 
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10.4.3 HYDICE Image Experiments 

The 15-panel HYDICE scene in Fig. 1.7(a) was used for experiments. Three 
examples were studied to demonstrate how different levels of signature knowledge affect 
the performance of constrained MFC. 

Example 10.6 Complete Prior Signature Information 

The five panel signatures, PI, P2, P3, P4 and P5 shown in Fig. 1.8 were used to 
form a desired signature matrix for M. In addition to the panel signatures, four more 
signatures were generated to represent the background. A tree signature was obtained by 
averaging 768 pixels in a large rectangle of pixels from the left side of the image. A grass 
signature was generated by averaging 1152 pixels from two large rectangles, one located 
between the trees and the panels, and the second at the right side of the image. Averaging 
8 pixels from the gravel road and 7 pixels of shade located along the tree line produced 
road and shade signatures. Since most of the road was shaded, it is very difficult to see 
from Fig. 1.7(a). The spectra of these four background signatures are shown in Fig. 10.8. 




Figure 10.8. Four background signatures (grass, tree, road, and shade) in Fig. 1.7(a) 

Including the four background signatures along with the five panel signatures gave a 
total of nine signatures that were used to form M. Fig. 10.9 shows the performance of 
LS, SCLS, NSCLS, NCLS, NNCLS, FCLS and MFCLS where figures from left to 
right are detection results for panels from rows 1 to 5 respectively. 





194 



HYPERSPECTRAL IMAGING 





Like Fig. 10.3, the range of abundance fractions estimated by LS, SCLS, NSCLS, 
NCLS, NNCLS are specified underneath of each image they generated. Apparently the 
best results were produced by NCLS. Interestingly, FCLS and MFCLS had the worst 
performance in detection of the panels in row 2, but performed reasonably well for other 
panels. This shows that fully constrained MFC does not necessarily perform better than 
unconstrained MFC or partially constrained MFC in target detection. 

Until now we simply set the value of 5 at <5 = 1.0x10“^ for use in FCLS and 
UFCLS. In order to demonstrate its effect on performance of FCLS, the same experiment 
was conducted with ^ = 1.0 x 10"^ and S = 1.0 x 10~^ . These results are shown in Fig. 
lO.lO(a-b) where the results of Fig. 10.10(b) were better than those in Fig. 10.10(a). 
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Comparison of these results in Fig. 10.10 to those of NCLS in Fig. 10.9 illustrates 
the increasingly similar performance as the value of 6 was increased. This effect was 
expected, since 5 controls the impact of ASC and a reduction of this impact should 
correspond to increased similarity between results of FCLS and NCLS. 

An advantage of using panels for experiments is that ground truth provides their 
exact dimensions and can use this information to quantify the amount of each panel 
material present in the scene. Since each image pixel is approximately 1.5m x 1.5m, 

with area-per-pixel of 2.25m\ we can determine the quantity of panel materials by 
dividing the area of the 3m x 3m, 2m x 2m and Im x Im panels by the area-per-pixel, 
which results in quantification values 4.0, 1.78 and 0.44 pixels respectively. 
Consequently, the image contains a quantity of approximately 6.22 pixels of each panel 
material. Table 10.1 contains the image quantification results for each of LS, SCLS, 
NSCLS, NCLS, NNCLS, FCLS and MFCLS. As we can see, each method performed 
poorly. 



Table 10.1. Quantification of LS, SCLS, NSCLS, NCLS, NNCLS, FCLS and MFCLS 





LS 


SCLS 


NSCLS 


NCLS 


NNCLS 


FCLS 
5 = 

1 X 10'^ 


FCLS 
5 = 

1 X 10'’ 


FCLS 
5 = 

1 X 10'^ 


MFCLS 


PI 


-73.97 


-67.72 


143.35 


106.30 


105.94 


107.97 


109.18 


106.30 


83,53 


P2 


60,05 


53.66 


152.40 


67.29 


64.31 


608.29 


355.33 


67.33 


623.83 


P3 


28,37 


27.08 


112.67 


24.02 


22.90 


36.66 


28.41 


24.02 


38,77 


P4 


137.45 


129.06 


215.44 


44,34 


45.48 


169.11 


123.00 


44.34 


120.09 


P5 


-101.32 


-94.71 


139.41 


53.93 


41.89 


26.38 


36.59 


53.94 


107.80 



In order to determine how each of the methods performed in quantification of each of 
the three panel sizes. Table 10.2 contains quantification results for each of the methods 
using only the R and Y pixels of each panel. 



Table 10.2. Quantification of results over only R and Y pixels for LS, SCLS, NSCLS, NCLS, NNSCLS and 



^FCLSj^mgjnanunl^^ 





LS 


SCLS 


NSCLS 


NCLS 


NNCLS 


FCLS 
5 = 

1 X 10'^ 


FCLS 
5 = 

1 X 10^ 


FCLS 
5 = 

1 X 10'^ 


MFCLS 


P.. 


5.99 


4,26 


3.70 


4.08 


4.08 


4,11 


4.11 


4.08 


4.04 


Pp 


3.01 


2.09 


1.83 


1.70 


1.69 


1.71 


1.70 


1.70 


1.70 


Pn 


0.50 


-1.01 


0.01 


0.05 


0.05 


0.00 


0.01 


0.05 


0.12 


P^> 


5,89 


4.75 


4.03 


4,74 


4.66 


7.67 


6.40 


4.74 


7,59 


P’: 


0.30 


1.40 


1.68 


1.36 


1.30 


7.48 


4,71 


1.36 


7,28 


P 23 


-0.02 


0.96 


0.78 


0.73 


0.71 


2.49 


1.75 


0.73 


2.36 


P 31 


4.39 


4.61 


3.62 


4.59 


4.53 


3.87 


4,14 


4.59 


3.80 


P 32 


3.86 


3.80 


2.40 


1.76 


1.71 


1,01 


1.25 


1.76 


0.97 


P 33 


1.28 


1.44 


0.95 


0.58 


0.56 


0.34 


0.39 


0,58 


0.29 


Pa, 


4.53 


5.89 


4.03 


3.16 


3.05 


3.49 


3.62 


3.16 


3.37 


Pp 


8.66 


5.91 


3.78 


1.46 


1.44 


1.50 


1.62 


1.46 


1.19 


P 4 J 


3.48 


3.59 


2.05 


0.44 


0.43 


0.46 


0.43 


0.44 


0.59 


Ps, 


-0.10 


-0.07 


3.10 


4.12 


3.94 


3.67 


3.61 


4,12 


4.11 


P 5 - 


0,25 


0.75 


1.08 


1.82 


1.81 


1.70 


1.82 


1,82 


1.50 




-1.21 


-0.88 


0.13 


0.41 


0.40 


0.26 


0.35 


0.41 


0.19 
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For instance, the quantification results for p,^ in row one of Table 10.2 were 
calculated using only the twenty-five pixels corresponding to p^/s R and Y pixel mask. 

These results show that NCLS, NNCLS, FCLS (5 = 1.0 x 10 and MFCLS performed 
better than other methods in terms of both detection and quantification. Since the large 
background field was not well represented by the four generated background signatures, 
all the methods performed poorly. As noted, the HYDICE sensor can extract objects less 
than 1 meter; thus, the sensor may also pick up many unknown targets. This is in shown 
in Fig. 10.9. Unfortunately, the knowledge of these unidentified materials cannot be 
obtained a priori. This is a situation where the UFCLS method finds its most useful 
applications. 

Example 10.7 Partial Prior Signature Information 

In this experiment, only the five panel signatures were assumed to be known a 
priori. In this case, the UFCLS algorithm was used to generate additional signatures to 
represent the background. The five panel signatures from Example 10.6 were selected as 
the initial material signature set to initialize the UFCLS algorithm. It is interesting to 
note that at iteration 1 1 the UFCLS algorithm selected the top R pixel for panel P5 as a 
new material signature. This indicated that one of the two R pixels was actually a mixed 
pixel. In order to determine which was the mixed pixel, UFCLS, with no prior signature 
knowledge was used. It selected a bottom R pixel; consequently, only the bottom R 
pixel was used to represent panel P5 in this experiment. As stated before, UFCLS 
requires a prescribed error threshold to determine how many targets will be included in 
the signature matrix M. In order to determine this value we used the UFCLS algorithm 
and plotted the maximum least squares error values resulting from (10.3). Fig. 10.11 
shows the results for iterations of k from 10 to 49. 




Figure 10.11. Results for iterations of k from 10 to 49 

The plot starts to level off after about 18 iterations of the UFCLS algorithm. In 
order to see how the threshold value effects UFCLS, three threshold values (7.0 x 10^ 

4. 0 X 10^ and 2.0x10^) were selected, each of which thresholded by the horizontal lines 
in Fig. 10.11. Fig. 10.12 shows the results of UFCLS using these three threshold 
values. 
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As we can see from Fig. 10.12(a), when the error threshold is too large it results in 
an insufficient number of signatures to represent the image background. Tables 10.3 and 
10.4 contain quantification results for the entire image and for the R and Y pixels 
respectively. 



Table 10.3. Quantification of quantification results over all image pixels using UFCLS method with partial 



^^^rior^signatureknow^ 





UFCLS 
e = 7.0 X lo' 


UFCLS 
£ = 4.0 X 10^ 


UFCLS 
£ = 2.0 X 10^ 


PI 


19.7649 


13.9114 


7.3021 


P2 


109.4454 


41.2200 


19.9960 


P3 


20.3169 


14.8010 


23.0000 


P4 


15.2187 


11.5537 


9.4493 


P5 


21.5999 


11.2960 


10.2822 



Table 10.4. Quantification of results over only R and Y pixels using UFCLS method with partial prior 








UFCLS 
£ = 7.0 X 10 ' 


UFCLS 
£ = 4.0 X 10^ 


UFCLS 
£ = 2.0 X 10 ' 


Pm 


3.2691 


3.1162 


3.0897 


P.2 


1.3128 


1.2508 


1.2433 


Pl3 


0.0413 


0.0492 


0.0526 


P2. 


4.9350 


4.2786 


4.2540 


P 22 


2.4201 


1.8693 


1.6412 


P 23 


1.1844 


0.7295 


0.5539 


P 3 . 


3.9199 


4.0814 


4.3011 


P 32 


1.7502 


1.7754 


2.0301 


P 33 


0.4203 


0.4330 


0.5630 


P 4 . 


2.9124 


3.1241 


2.9777 


P 42 


1.6915 


1.5367 


1.7529 


P 43 


0.4634 


0.5104 


0.4758 


P 51 


3.7763 


3.8227 


3.9045 


P 52 


1.8771 


1.8153 


1.8415 


P 53 


0.4937 


0.5173 


0.4756 



As we can see from the quantification values in these two tables, UFCLS using 
partial prior material knowledge performed significantly better than FCLS using 
manually selected material signatures. 
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Example 10.8 No Prior Signature Information 



The previous experiments assumed that a priori knowledge of the material 
signatures was available. In this final experiment we conclude by assuming that no prior 
material knowledge is given. As before, we must prescribe an error threshold value to 
terminate the UFCLS algorithm. Again, we used the UFCLS algorithm and plotted the 
maximum least squares error values resulting from (10.3), as shown in Fig. 10.13(a). In 
order to determine an appropriate threshold the difference between every fifth sample was 
calculated using the following equation 



ALSE^'^ 



max,{LSE'‘^''’(r)} - max^{LSE“'(r)} 
max^|LSE*'‘~*'(r)| 



( 10 . 11 ) 



This is the same method used to determine the number of targets or codewords for an 
unsupervised vector quantization method in Brumbley (1998). This function is plotted in 
Fig. 10.13(b) and shows a significant drop at values 1 1, 23, and 40. 





(a) (b) 

Figure 10.13. UFCLS Error: (a) Least squares error, (b) Differential least squares error. 



To begin this analysis we selected an error threshold value of 8 = 1.5 x 10^ which 
resulted in termination of the UFCLS algorithm after 11 iterations. The resulting 

LSE^^^(r) images are shown in Fig. 10.14. 




(pm-, PI 

Figure? KU4. Result;^ of UFCLS using these three tbreshuld values 



In Figs. 10.14(c), 10.14(d) and 10.14(i) R pixels for panel P5, P3 and PI were 
selected as initial pixels respectively. However, unlike the AVIRIS data, the LSE^*^(r) 






TARGET ABUNDANCE-CONSTRAINED MIXED PIXEL CLASSIFICATION 



199 



image in Fig. 10.14(1) was far from a chi-square distribution. Fig. 10.15(a)-(c) shows 
LSE^*^(r) images for iterations, k equal to 23, 40 and 49 respectively. 




10vl5. oi l.KE ^ (r ) .4 to 23. ^rs- 49 roir^pcotivdy 



Even after 49 iterations of the UFCLS algorithm, the error image still failed to 
achieve a chi-square distribution. This may result from the very fine spatial and spectral 
resolutions the HYDICE sensor, which can extract targets with size as small as 1-4 
meters. Therefore, it may take a long iterative process for the UFCLS algorithm before it 
stops. 

In order to compare the various methods a prescribed error threshold of 
£ = 2.54 X 10^ was selected for the UFCLS algorithm and resulted in the generation of 
41 signatures. These 41 signatures used to form a signature matrix M. Using this M, LS, 
SCLS, NSCLS, NCLS, NNCLS, FCLS and MFCLS were implemented. Fig. 10.16 
shows results for m^, ni 33 , m^, ni 35 and m 3 where panel PI, P2, P3, P4, P4 and 
P5 were detected by various methods respectively. 
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(b) (c) (I) (5?) (fj m) 

Ilgisre nu^. Hesyiis ol LS. bCib, V^^C'I.S, HHS. mClS. UFl'LS aod M¥Ci.% lismg large? ^?5gnarisrc^ 
gej5uniied by the UFCl.S iUgoriibrn 



In each case, an R pixel was selected as a material signature, except for 11133 which 
turned out to be a Y pixel next to the top R pixel of panel p^, . It should be noted that in 
this unsupervised case, the UFCLS algorithm actually detected the panels in row 4 in 
two separate images. This separation could have been avoided if the UFCLS algorithm 
was terminated at iteration 34 or 35. However, determining the most appropriate 
stopping threshold is generally difficult due to the lack of prior information. As we can 
see from these images, NCLS, NNCLS and UFCLS performed the best and significantly 
better than LS, SCLS and NSCLS. An anomaly was also observed in Fig. 10.16(g), 
where there was an unknown signature detected in the upper left comer of the tree line, 
nig. Tables 10.5 and 10.6 contain quantification results for the entire image and for the 
Y and R pixels respectively. 



T able 10.5. Quantification results of LS, SCLS, NSCLS, NCLS, NNCLS, UFCLS and MFCLS for Y pix els 





LS 


SCLS 


NSCLS 


NCLS 


NNCLS 


FCLS 


MFCLS 


Pll 


3.21 


3.14 


1.74 


3.14 


3.12 


2.96 


2.94 


P12 


1.02 


0.97 


0.49 


1.29 


1.28 


1.26 


1.23 


Pl3 


-0.23 


-0.23 


0.06 


0.09 


0.09 


0.05 


0.05 


P2I 


5.21 


4.96 


2.37 


3.70 


3.69 


3.37 


3.28 


P22 


1.20 


1.30 


0.56 


1.20 


1.17 


1.43 


1.43 


P23 


0.47 


0.52 


0.23 


0.57 


0.56 


0.61 


0.68 


P31 


4.01 


4.01 


1.91 


4.09 


4.10 


4.03 


4.06 


P32 


2.52 


2.52 


0.86 


1.95 


1.94 


1.96 


1.91 


P33 


0.77 


0.77 


0.25 


0.49 


0.49 


0.50 


0.45 


P4] 


4.70 


4.79 


2.18 


1.94 


1.90 


1.97 


1.91 


P41 


-0.97 


-1.01 


1.20 


1.57 


1.54 


2.05 


1.95 


P42 


-0.19 


0.12 


0.38 


1.42 


1.41 


1.47 


1.44 


P42 


2.82 


2.66 


0.90 


0.33 


0.33 


0.35 


0.32 


P43 


0.81 


0.93 


0.29 


0.54 


0.53 


0.60 


0.51 


P43 


-0.12 


-0.18 


0.08 


0 


0 


0 


0.04 


Psi 


4.51 


4.54 


2.14 


3.85 


3.81 


3.69 


3.67 


P52 


1.51 


1.51 


0.50 


1.85 


1.84 


1.79 


1.79 


— _ 


0.28 


0.19 


0.09 


0.47 


0.46 


0.45 


0.39 
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T able 10.6. Quantification results of LS, SCLS, NSCLS, NCLS, NNCLS, UFCLS and MFCLS for R pixe ls 





LS 


SCLS 


NSCLS 


NCLS 


NNCLS 


UFCLS 


MFCLS 


nip 


-49.2866 


-53.4773 


8.5554 


9.0078 


9.0037 


9.5068 


10.6713 


m33 


22.6381 


37.7144 


34.2775 


34.4859 


34.7202 


28.3764 


33.3212 


m4 


45.0890 


45.0504 


23.0562 


22.9769 


22.9950 


24.9895 


21.0518 


ni22 


186.1792 


202.9720 


109.4225 


7,0436 


6.9863 


8.7071 


8.8039 


m35 


-135.5971 


-143.8910 


52.4539 


3.2501 


3.1987 


6.5286 


5.0979 


m3 


7.5093 


-1.95281 


9.8006 


8.8974 


8.8122 


8.4374 


9.2320 



As we can see from the quantification values in these two tables, the NCLS, 
NNCLS and UFCLS performed significantly better than the other methods. 

In the above experiment UFCLS was able to effectively detect and quantify materials 
in the image scene. However, since UFCLS was completely unsupervised, further 
analysis would be required to identify all the unknown materials. Calibrating the spectra 
of the extracted material signatures and comparing them against a reference database such 
as a spectral library can generally do this. 



10.5 NEAR REAL-TIME IMPLEMENTATION 

In general, a supervised algorithm that performs mixed pixel classification on a 
pixel-by-pixel basis can be implemented in real time such as OSP-based classifiers. This 
assumes that complete knowledge is provided a priori. It is also true for pixel-based 
constrained supervised algorithms such as NCLS and FCLS. The difficulty of real time 
implementation for unsupervised MPC arises from the case that there is no prior 
knowledge available about the image data. Under this circumstance the necessary 
knowledge must be obtained directly from image data. As described in Chapter 5, one 
way to obtain such information is to take advantage of the information provided by the 
image data such as sample correlation/covariance matrix used in CEM and TCIMF in 
Chapter 4 and anomaly detectors in Chapter 6. 

An alternative approach is to design a two-pass algorithm where the first pass is 
designed to obtain desired image information, then followed by the second pass to 
perform subpixel detection and mixed pixel classification using the knowledge generated 
by the first pass. This section presents a near real time implementation approach. It is a 
two-pass process, but it makes two passes into one pass with negligible time delay. It 
only uses line-by-line causal information as the way implemented in CEM in Section 4.5 
and RXD in Section 6.6.3. 

Since the major strength of FCLS and UFCLS is quantification, they sometimes 
may not work as effective as NCLS and UNCLS in terms of target detection and 
classification. This is because the abundance fractions estimated by FCLS must satisfy 
ASC. When image data contains many signal sources such as 34 targets required for the 
1 5-panel HYDICE scene to perform unsupervised subpixel detection and classification in 
Fig. 10.16, the abundance amounts generated by FCLS for each target signature become 
very small resulting from ASC. As expected, it may not perform as well as NCLS and 
UNCLS which do not impose ASC. So, in this section we present a near real-time 
implementation of UNCLS rather than UFCLS. Nevertheless, a near real-time 
implementation of UFCLS can be also implemented in exactly the same way as does 
UNCLS (Heinz, 2001). It is important to note that the detection and classification are 
conducted simultaneously on a pixel-by-pixel basis. The near real-time UNCLS 
algorithm can be described as follows. 
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Let L represent the number of spectral bands and let r be an Lx 1 column pixel 
vector in a hyperspectral image. The first pixel vector , to be analyzed is selected as 
target with its corresponding signature m,. Using we form a target signature 
matrix M = [mj. This matrix is then used to estimate the abundance fraction 
in the second pixel vector . The LSE between and its estimated linear mixture 
«j(r,)r 2 is then compared to a prescribed error threshold. If this error threshold is 
exceeded then r, is selected as target t, with signature , and a new target signature 
matrix M = [m, m^] is formed, otherwise, M remains unchanged. This same procedure 
is repeated on a pixel-by-pixel basis until all pixels have been processed. 

Using the 15-panel HYDICE image scene in Fig. 1.7(a) as an example, the near real- 
time processing of the UNCLS algorithm was started at the upper-left pixel and 
continued across the top row on a pixel-by-pixel basis. As each new target signature was 
selected, a new classification image was displayed. After completion of the first row, 
each new row of data was processed using the same left to right pixel flow. After the first 
row was processed, there were seventeen targets t,,!,,---, with their corresponding 

signatures were detected. The classification results of the first 17 



signatures are shown in Fig. 10.17. 






UNCLSm 1 [0, 1] 


UNCLS m 2 [0, 1] 


UNCLS m 3 [0, 1] 


UNCLS m 4 [0, 1] 


UNCLS m g [0, 1] 


UNCLS m g [0, 1] 


UNCLSm 7 [0, 1] 


UNCLSm 8 [0, 1.1] 


UNCLSm 9 [0, 1] 


UNCLSm 10 [0, 1] 


UNCLSm ,1 [0, 1] 


UNCLSm ,2 [0, 1] 


UNCLS m ,3 [0, 1] 


UNCLSm 1 JO, 1] 


UNCLSm ]g[0, 1] 


UNCLSm ,6 [0, 1] 



UNCLS m [0, 1] 



Figure 10.17. Classification results after the first row was processed in near real-time 

Figure 10.18 shows classification results after completion of 20 rows in near real-time 
processing where 3 1 targets were generated. 

In Fig. 10.18 the panel pixels pn, pi 2 and pn were classified as four separate targets 
(19, 20, 22 and 23). This resulted from the pixel-by-pixel real-time implementation 
which first detected the mixed pixel as targets 1 9 and 20 (mainly grass with some panel), 
before the more pure pixels, targets 22 and 23 were available. This separation also 
occurred for the second row of panels, with the panel pixels p 2 i and p 22 being classified as 
four separate targets (27, 28, 29 and 30). One possible solution to this problem is to 
monitor the target signature matrix and remove mixed pixel targets upon detection of a 
more pure pixel target. However, this solution requires a technique for reclassification of 
pixels once the mixed pixel targets are removed. For instance, the detection of target 23 
would cause the removal of targets 19, 20 and 22 from the target signature matrix and all 
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pixels classified with targets 19, 20 and 22 in the target signature matrix would be 
reclassified. Another simpler method to avoid detection of mixed pixels is to implement 
a look-ahead procedure into the UNCLS algorithm. A two-row look-ahead is appropriate, 
since its goal is to avoid detection of mixed pixels at the transitions from one target to 
the next. In addition, this look-ahead procedure can also avoid display of discontinuities 
associated with the removal of targets. 




Flgiire 10JS, re.satts afte?' 20 ufwn were pmceased m reaLfinie 

However, even with this two-row look-ahead procedure there is still an additional 
problem with the current UNCLS algorithm in that once it selects a target it is not 
allowed to update this selection. Therefore, as a final modification to the near real-time 
UNCLS algorithm described above, a real-time target update was added. This update is 
based on least squares error and allows the processing algorithm to replace a target in the 
target signature matrix with the current pixel target, provided the least squares error is 
reduced by this replacement. 

Fig. 10.19 shows the classification results after completion of the algorithm using 
the two-row look-ahead procedure and a real-time target update. Compared to 31 targets 
detected and classified in Fig. 10.18 without the real-time target update, only 19 targets 
detected and classified in Fig. 10.19. As we can see, all five rows of panels were detected 
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and properly classified using these generated 19 targets. This is a significant reduction in 
target generation. Furthermore, by adding the two-row look-ahead procedure the mixed 
pixel target selections in the first two rows of panels were avoided. 




resdtis afler all 64 rows were processed usmg a iwo-row procedure 

rcal-km.e target apdalc 



In order to illustrate the need of the real-time target update, Fig. 10.20 shows pixel 
locations (bright white pixels) where the updates to the road and shade targets (2 and 5) 
occurred. This experiment shows that processing without this target update would have 
resulted in separation of the road target into two classification images. 
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10.6 CONCLUSIONS 

Unconstrained linear unmixing has been used for hyperspectral image classification 
due to its mathematical tractability. However, the resulting abundance fractions of target 
signatures may be negative and their sum may not be necessarily summed to one. 
Therefore, unconstrained linear unmixing generally cannot be used for target 
quantification. This chapter presents two least squares methods to solve fully constrained 
linear mixing problems. One is a fully constrained least squares (FCLS) method, which 
implements two constraints, ASC and ANC simultaneously. Since there is no closed 
form solved for fully constrained linear mixing problems, an efficient algorithm is further 
developed to generate the desired optimal solutions. A second method is a modified fully 
constrained least-squares (MFCLS) method, which also implements two constraints. One 
is ASC. Another is AASC, which replaces ANC in FCLS. Unlike FCLS, MFCLS takes 
advantage of AASC to derive an iterative algorithm to find optimal solutions. Since both 
FCLS and MFCLS require complete knowledge of target signatures, an unsupervised 
least squares error-based approach is also proposed to extend their capability to 
unsupervised methods, UFCLS and UMFCLS. Despite a slight degradation in 
quantification performance, the UFCLS and UMFCLS have a significant advantage that 
they can be used for finding and quantifying anomalies. Finally, like CEM and RXD 
UNCLS and UFCLS can be also implemented in real time. However, since UNCLS and 
UFCLS do not have complete target knowledge as required by a real-time 
implementation, a two-row look-ahead procedure and a real-time target update are 
included to make UNCLS and UFCLS process images in near real time. A similar 
concept of near real-time process will be discussed in Chapter 14 where on-line process 
will be used instead for distinction. 
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TARGET SIGNATURE-CONSTRAINED MIXED PIXEL 
CLASSIFICATION (TSCMPC): LCMV CLASSIFIERS 



The target abundance-constrained mixed pixel classification that was considered in 
Chapter 10 imposed ASC and ANC on target abundance fractions. In this case, a linear 
mixture model is required and the target signature matrix M must be also known a 
priori. In this chapter, we consider the target signature-constrained mixed pixel 
classification which does not need a linear mixture model. Instead, it classifies a mixed 
pixel by constraining the spectral shapes or vector directions of target signatures rather 
than target abundance fractions. Such concept was explored in Chapter 4 and referred to 
as linearly constrained minimum variance (LCMV) approach. It was used to design target 
signature-constrained subpixel detectors, CEM and TCIMF. A subpixel detector can 
detect targets, but does not necessarily imply that it can also classify the targets it 
detected. It may occur that a detector can detect all targets of interest but cannot 
discriminate one from another. In this case, the detection rate can be as high as 100%, 
but the classification rate could as low as 0%. In order for LCMV-based detectors to also 
achieve target classification, they must be implemented one target at a time so that the 
detected targets can be classified in a separate image. This chapter extends LCMV-based 
detectors to LCMV classifiers. It develops an approach that expands the capability of the 
LCMV-based detectors in such a fashion that it can simultaneously detect and classify 
multiple targets in a single image where different colors are used to highlight distinct 
types of detected targets. In particular, such color assignment approach also allows us to 
extend a CEM-based detector in Chapter 4 to a CEM-based classifier. Despite that an 
LCMV classifier requires the prior knowledge of desired targets, it can take advantage of 
the unsupervised algorithms presented in Chapter 5 to generate the necessary target 
knowledge and make its classification unsupervised. 



11.1 INTRODUCTION 

In Chapter 4, a CEM-based detector detected a desired target by constraining the 
spectral shape or vector direction of its signature vector d = to the scalar constraint 

constant 1 . In this case, a CEM-based detector can only detect a single target. In order to 
detect multiple targets, a CEM-based detector was recently extended to an LCMV-based 
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detector (Chang and Ren, 1999) where the scalar constant 1 was replaced by a unity 
vector 1 = (l, l)^ as a constraint vector with each component "1" in 1 used to 
constrain a specific target of interest. More recently, this LCMV detector was further 
extended to TCIMF (Ren and Chang, 2001) described in Chapter 4. It can detect a set of 
desired targets while also eliminating a set of undesired targets. Their idea expands the 
unity vector 1 to a vector made up of a unity vector 1 and a zero vector 0, i.e. 

= (l,l,---,l,0,0,---,0)^ such that a component "1" is used to constrain a specific 

desired target and a component "0" is used to annihilate an undesired target. The strength 
of TCIMF is to enhance detection performance by simultaneously eliminating undesired 
targets and minimizing the interfering effects resulting from unknown signal sources. A 
third version of an LCMV-based detector to be presented in this chapter also uses a unity 
vector 1 to remove the image background as was done by the LPTD in Section 5.5. 
Interestingly, this type of the background removed-LCMV approach includes as a special 
case the filter vectors (FV) developed by Bowles et al (1995). Although LCMV-based 
detectors can be also used for multiple-target classification, but they cannot detect and 
classify multiple targets at the same time. They must be carried out in a way that one 
type of targets is detected at a time, then different types of targets are classified in 
separate images. In order to resolve this problem, we need to find a mechanism for an 
LCMV-based detector to perform classification while the target detection takes place. One 
feasible way is to use specific colors to classify multiple targets when the targets are 
detected. During the course of the detection, targets of same type are specified by one 
particular color and targets of distinct types are discriminated by different colors. Using 
such color assignment we can also extend a CEM detector to a CEM classifier. The 
evolution from CEM through LCMV to TCIMF provides a clue for a possible solution. 
It started with a scalar constraint constant 1 for a CEM detector, then a vector constraint 

1 for an LCMV detector, and finally arrived at a two-component constraint vector (l,0)^ 
for a TCIMF detector. So, a natural extension of an LCMV detector may be one, which 
uses a constraint matrix. However, such approach is not trivial since we are now dealing 
with a matrix rather than a vector, in which case, we need to identify what roles the rows 
and columns of the constraint matrix will play. This chapter explores this approach to 
accomplish multiple-target detection and classification in one single operation. 



11.2 LCMV CLASSIFIERS 

In order to make an LCMV detector specified by (4.5) an LCMV classifier, we 
introduce a set of constraint vectors derived from c, denoted by where 

c. = (O,-- -,c.,“-,0)^ is a/7-dimensional column vector with ”c/’ in the z-th components 

and "0"'s in all other components. Substituting for the constraint c in (4.4) for 

each z, I < i < p results in the following LCMV classification problem 

min^ subject to M^w. = c. for l< i < p 



( 11 . 1 ) 
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where w. is a weight vector used to classify the specific target t. with its signature m.. 
The optimal solution to (1 1.1), can be obtained by 

w™ = for each 1 < i < p. (1 1.2) 

A classifier that uses a set of ^ specified by (11.2) to classify the p target 

signatures in M is called an LCMV classifier. 

As noted in Section 5.5, ^rxd-utd improved its detection performance 

by background subtraction where a unity vector 1 was implemented by and <5^^ to 
extract the image background. The same idea can be also applied to an LCMV classifier 
to improve its classification performance. In doing so, we first augment the /7-target 
signature matrix M to M = [m, m, which includes an L-dimensional unity 

vector = (1,1,---,1)^ as the last column of M for the purpose of background 

L 

annihilation. 

Following a similar way that an LCMV classifier was derived in (11.2), we define 
c. = (c.,0) = 0)^ as a (p+l)-dimensional column vector with " c." in the 

T 

/-the components and "0"'s in all other components. Substituting ^ for the 

constraint in (1 1.1) yields a background-removed LCMV classification problem 

min^ yv] '^L.L w. subject to M^w. = c. for 1 < / < p (H-3) 

where the optimal solution to (1 1 .3), can be obtained by 

^BRLCMV ^ (m’'R;‘^^M)"'c, . (1 1.4) 

A classifier that uses the set of specified by (11.4) to classify the p target 

signatures in M is called a background-removed LCMV (BRLCMV) classifier. 

11.3 BOWLES ET AL.’S FILTER VECTORS (FV) ALGORITHM 

Recently, Bowles et al. developed a filter-vectors (FV) approach to hyperspectral data 
analysis (Bowles et al., 1995). Their idea is to construct a set of filter vectors to unmix 
targets of interest in an image scene, of which each filter vector is used to classify a 
specific target. Like linear unmixng methods studied in Chapter 7, the FV approach also 
requires a linear mixture model described by (3.1) to design a set of L-dimensional filter 

vectors, {w. ^ that must satisfy the following three conditions, 




210 



HYPERSPECTRAL IMAGING 



w^m = S . 

' J i; 


(11.5) 


w, • = 0 for each 1 < / < p 


(11.6) 


. is a minimum for each l<i < p. 


(11.7) 



The solution derived by the FV approach consists of p filter vectors, , specified 

by the matrix that can be obtained by 

( 11 . 8 ) 

where 

M = [m, •• -nip] with m. = m. ^ (11-9) 

s.. = m^m^. and S = (11.10) 

Solving (1 1.5)-(11.7) is equivalent to solving the following constrained 
optimization problem: 

min^ w^w. for each 1 < i < j!? (11.11) 

subject to 

= 0 and w^m^. = S.. (11.12) 

Interestingly, if we examine closely the two constraints specified by (1 1.12), we will find 
that the first constraint (i.e. (11.5)) can be taken care of by a set of constraint vectors 

{c.}^ ^ given by (11.3) that is used in BRLCMV by letting c. = 1 for each 1 < f < p. 

The second constraint in (11.12) (i.e. (11.6)) can be satisfied by including 1^^^ in M 
with = 0 for I <i < p. Now if we further replace in (11.3) with the identity 
matrix the BRLCMV classification problem specified by (1 1.3) becomes 

min^ w^w. subject to for each l< i < p. (11.13) 

Compared the constrained classification problem specified by (11.13) to Bowles et al's 
filter vectors problem outlined by (11.5)-(11.7) (i.e., (1 1.1 1)-(1 1.12)), these two 
constrained optimization problems turn out to be identical. This implies that Bowles et 
al.'s FV classifier becomes a special case of the BRLCMV classifier where it can be 
considered a whitened BRLCMV with where the data are zero-mean and 

whitened. That is, Bowles et al.’s FV classifier operates pixel-by-pixel without 
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accounting for spectral correlation among sample vectors, while (1 1.3) takes advantage of 
spectral correlation among samples by including into the optimization problem. As 
will be shown in experiments, the BRLCMV performs significantly better than the FV 
approach. 



11.4 COLOR ASSIGNMENT OF LCMV CLASSIFIERS 

The LCMV described in Chapter 4 can be used to detect targets of interest by 
simultaneously passing these targets through a filter constrained by (4.1). Unfortunately, 
it also suffered from a disadvantage that it could not differentiate these detected targets 
one another. To expand the capability of an LCMV detector as an LCMV classifier, we 
further augment the constraint vector c in (4.1) to a constraint matrix C. 

More precisely, assume that the size of a constraint matrix C is ky. p and denoted 

by C = I^Cj j. In this case, the weight vector w in (4.1) is also expanded by an 

Lxp weight matrix W = Jw, w, ••• j, which is formed by p weight vectors 

Wj , , • ■ • , and is represented by 

M^W = C (11-14) 

The constraint matrix C plays a significant role in multiple -target classification. It makes 
use of each ^ x 1 column vector c. for \ <i< p to detect a specific type of targets and 

each row vector of C to assign a particular color to the detected targets so as to achieve 
target classification. In this case, we are required to solve separately a set of the following 
constrained optimization problems 

min^ subject to M^w, = c, for 1 < i < p. (11.15) 

Using the constraint equation (11.14) and (11.2) which is a solution to (11.15), the 
optimal solution can be obtained by the following 

matrix form 

(m^R:‘ ,m)' C . ( 11 . 16 ) 

The use of the weight matrix in (1 1.16) is two-fold. It detects desired targets, and 

at the same time it also classifies targets with the pre-selected colors assigned by row 
vectors in the constraint matrix C. More specifically, the column dimensionality of C, /?, 
specifies the number of colors to be used to classify targets, whereas the row 
dimensionality of C, k, is the number of targets of interest needed to be classified. For a 
TCIMF-based detector in Chapter 4 to be implemented as a TCIMF classifier, we only 
have to substitute [DU] in (4.9) for the M in (1 1.16). 

To further illustrate the above idea, let us assume that there are k targets to be 
classified into p classes. In this case, the constraint matrix C = [cj c, j given by 
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(11.14) must have size oi kx p where each of column vectors c. for \ <i< p has same 
dimensionality, k and corresponds to the constraint vector of an LCMV classifier 
specified by (11.1). It should be noted that k is not necessarily equal to p. For example, 
if there are two targets of the same type which have slightly different but similar spectral 
signatures, they must be classified into the same class. So, in this case, k < p. On the 
other hand, if partial knowledge of natural background signatures is available, it generally 
presents interference to target detection and classification, and can be considered as 
undesired signatures. The elimination of such interference can enhance target detection 
and classification by virtue of TCIMF. These interfering signatures are annihilated by a 
zero column vector 0 in the constraint matrix C. Since all the detected k targets must be 
classified into p distinct classes, it is impossible to show their classification with gray 
levels in one image. This difficulty can be resolved if the detected targets of distinct 
types are assigned by different colors. Surprisingly, the row vectors in the constraint 
matrix C can be used for this purpose as illustrated in the following. 

Suppose that there are five targets, which belong to three types of 

objects, denoted by O, (t^), O 2 (13,13), O 3 (14,15), and two background signatures 1^,1^ 
which may be available from partial knowledge and can be used to represent undesired 
signatures. In this case, k = 3 and p = 7. If we assume that red, green, blue and black 
colors will be used to specify these three types of target classes and one class of 
background signatures, then the target signature matrix is formed by seven target 

signatures, M = j where are spectral signatures of the 7 

targets, Ij 1^ 1^ , respectively. Then the constraint matrix C can be specified by 



C^=[c, 0,0,7 



1 0 0 0 0 0 0 

0 1 1 0 0 0 0 

0 0 0 1 1 0 0 

V V V S-' V V s-< 

red green green blue blue black black 



(11.17) 



where c" = (l, 0 , 0 , 0 , 0 , 0 , 0 )\ c[ = (o, 1 , 1 , 0 , 0 , 0 , 0 )’^ and c, = (o, 0 , 0 , 1 , 1 , O.O)'^ 
are constraint vectors to detect the 7 targets. The three colors, red, green and blue are used 
to specify three types of objects O,, Oj and O 3 respectively. The "T’ in row 1 of is 
used to detect t^, the two ‘T”'s in rows 2 and 3 of are used to detect t,,t 3 and the 
two “r”s in rows 4 and 5 of C 3 are used to detect As noticed in (11.17), the rows 



6 and 7 in C (i.e. columns 6 and 7 in C^) are zero vectors which will be used to 
eliminate background signatures by the assigned black color. The reason we 

included background signatures in the above illustration is because in many 

practical applications, there always has some partial knowledge about image background. 
With the availability of this information we should take advantage of it. If there is no 
such knowledge available, the background signatures 1^,1^ in (11.17) would be absent 
and would be considered as unknown interferers rather than undesired signatures. In 
this case, the size of the C would be reduced to k = 3 and p = 5 . It is worth noting that 
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for visual discrimination the colors chosen to classify targets should be as distinct as 
possible. Nevertheless, they can be arbitrary. More importantly, if the color of a pixel is 
a mixture of different colors, it may suggest that this particular pixel could be mixed by 
the materials represented by these colors. Such valuable color information of how a pixel 
is mixed cannot be provided by an LCMV detector. 



11.5 EXTENSION OF CEM FILTER TO CLASSIFIERS 

As noted previously, if a CEM detector is used as a classifier, it must be 
implemented for one target at a time so that distinct types of targets can be classified in 
separate images. In order to classify all different targets in a single image, it requires to 
combine the separate images produced by a set of CEM detectors into a single image 
where each CEM-detected target must be highlighted by a different color. Three ways of 
extending a CEM detector to a classifier can be derived by such target combination. 

11.5.1 Winner-Take-All CEM (WTACEM) Classifier 

Let r denote an image pixel and |cEM^ (r)| be p target abundance fractional 

images generated by p CEM detectors, each of which is used to detect one specific target. 
One CEM-derived classifier is the Winner-Take- All CEM classifier, referred to as 
WTACEM(r) which is similar to WTAMPC specified by (8.2). For each image pixel r, 
it is defined by 



WTACEM(r) = max,, CEM/r) (11.18) 

which uses the winner- take- all rule to determine the gray scale of r. 

11.5.2 Sum CEM (SCEM) Classifier 

Another CEM-derived classifier is the Sum CEM classifier, referred to as SCEM(r), 
defined by 



SCEM(r) = E;=,CEM/r) (11-19) 

which simply sums up the p abundance fractions of each image pixel r. 

11.5.3 Multiple-Target CEM (MTCEM) Classifier 

If the constraints c, for \ <i<p in (11.15) are replaced with the same scalar 
constant “1” for \ <i<p, the resulting LCMV, referred to as multiple-target CEM 
(MTCEM) classifier, can be viewed as a set of p CEM detectors implemented 
simultaneously by solving the following constrained optimization problem 

min^ subject to M^w. = 1 for 1 < i < p, (11.20) 



which is a special case of (1 1.15). 
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11.5.4 Target-Constrained Interference-Minimized (TCIM) Classifier 

Like CEM, TCIMF can be also extended to a classifier, referred to as TCIM 
classifier. It implements a constraint matrix C in (11.14) to highlight different types of 
targets detected by TCIMF. The example using (11.17) for illustration in Section 11.4 is 
actually a color implementation of a TCIM classifier. Such TCIM classifier can be 
considered as a generalization of MTCEM, which includes a set of zero vectors to 
eliminate undesired target signatures. 

It should be noted that the three CEM extensions, WTACEM, SCEM and MTCEM 
plus the TCIM classifier are all special cases of the LCMV classifier specified by (11.15). 
They all generate a single image to classify multiple targets using a color assignment to 
highlight detected targets. For the WTACEM classifier, the color to be used to highlight 
WTACEM(r) is the same color that was assigned to the CEM *(r) detector where 

/ = argjmaXj^^.^^ CEM^(r)jis the solution to (11.18). For the SCEM classifier, the 

color of the SCEM(r) will be a mixture of the colors that are assigned to specify the p 
CEM-detected target signatures. One advantage of using SCEM is that its mixed color 
classification map provides different degrees of mixture in image pixel vectors. For 
instance, if a pixel is mixed by three target signatures with equal abundance fractions 
specified by red, green and blue, the resulting (red, green, blue)-mixed color of the pixel 
will be black. As another extreme, if a mixed pixel is a pure pixel, the color representing 
this pixel will be a pure color. For MTCEM and TCIM, their color assignment will be 
carried out by the procedure described in Section 11.4. In particular, when TCIMF is 
implemented with a color assignment for classification, it becomes a TCIM classifier and 
MTCEM classifier turns out to be its special version without including annihilation of 
undesired target signatures. 

As a final concluding remark, it is worth noting that according to (11.13) Bowles et 
al.'s FV classifier discussed in Section 11.3 can be considered as a whitened TCIM 
classifier with = ^lxl- 

11.6 COMPUTER SIMULATIONS 

In this section, computer simulations were conducted to demonstrate the performance 
of various LCMV classifiers. The used data set was the five AVIRIS reflectance 
signatures shown in Fig. 1.5. Since the spectral signatures of blackbrush, creosote leaves 
and sagebrush are similar, we chose these three targets to evaluate the detection 
performance of the LCMV classifier. A set of 400 mixed pixels was simulated. Red soil 
and dry grass were used to simulate background pixels with 50%-50% split. In addition, 
Gaussian noise was also added to each pixel to achieve a 30:1 signal-to-noise ratio. In 
order to implant three targets in these 400 pixels, we replaced pixel numbers 50th, 100th, 
150th, 200th with 20%, 40%, 60%, 80% blackbrush pixels respectively and the 
remaining abundance split by red soil and dry grass evenly. For instance, the 100th pixel 
contains 40% blackbrush, 30% red soil and 30% dry grass. Similarly, the 250th, 300th, 
350th, 400th pixels were replaced with 20%, 40%, 60%, 80% creosote leaves pixels 
respectively and the remaining abundance split by red soil and dry grass evenly. The 
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25th, 125th, 225th, 325th pixels were replaced with 20%, 40%, 60%, 80% sagebrush 
pixels respectively and the remaining abundance split by red soil and dry grass evenly. 

Fig. 11.1 shows the detection and classification results of the LCMV, BRLCMV, 
FV and OSP classifiers with M = [blackbmsh, creosote leaves, sagebrush]. 




sagebrush 

Figure 11.1. Classification results of the LCMV, BRLCMV, FV and OSP classifiers (from left to right) with 
M = D = [blackbrush, creosote leaves, sagebrush] 

As shown in Fig. 11.1, the LCMV and the BRLCMV classifiers performed 
comparably while the FV and the OSP classifiers performed very similarly but 
completely failed in classification of the three targets. However, if we further included the 
background signatures in the target signature matrix M and used U = [dry grass, red soil] 
as the undesired target signatures for the LCMV and the BRLCMV classifiers, the 
performance of the FV and the OSP classifiers was significantly improved as shown in 
Fig. 11.2. Since the signatures of red soil and dry grass are different from those of three 
targets, removal of undesired target signatures did not have much impact on the 
performance of the LCMV and the BRLCMV classifiers. This is not true for the FV and 
OSP classifiers. Unlike Fig. 11.1, the FV and the OSP classifiers was able to detect all 
the three targets and even performed slightly better than the LCMV and the BRLCMV 
classifiers. The reason for this was because the FV and the OSP classifiers had complete 
target knowledge that was required for the linear mixture model. In Fig. 11.1, the 
background signatures, red soil and dry grass accounting for most target abundance were 
not included. As a result, the FV and the OSP classifiers performed poorly. By contrast, 
the FV and the OSP classifiers worked very effectively in Fig. 11.2 because the target 
signature matrix M provided complete target information. 
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sagebrush 

Figure 11.2. Classification results of the LCMV, BRLCMV, FV and OSP classifiers (from left to right) with 
M = [D U] = [blackbrush, creosote leaves, sagebrush, dry grass, red soil] 

Despite that both the FV and the OSP classifiers performed very similarly in terms 
of target detection and classification, their detected abundance fractions were very 
different. It should be noted that a different scale was used to show the detectability of 
the OSP classifier in Fig. 11.2. Compared to the FV classifier, the OSP classifier did 
not accurately estimate abundance fractions for the targets it detected. As demonstrated in 
Chapters 6 and 7, in order for the OSP to produce more accurate abundance fraction 

estimates, the constant K = must be included in OSP in (3.9). The resulting 

OSP classifier was referred to as, oblique subspace projection classifier (OBSP) or 
unconstrained Gaussian maximum likelihood estimation classifier (UGMLC) in Chapter 
8. Fig. 1 1.3 shows the classification result of the same experiment done for Fig. 1 1 .2 for 
the FV, the OSP and the OBSP classifiers at the same scale. As we can see from Fig. 
11.3, the FV and the OBSP classifiers performed nearly the same while the OSP 
performed very poorly after the detected abundance fractions were scaled to the same 
range used for the FV and the OBSP classifiers. Interestingly, the experiment conducted 
in Fig. 11.3 also demonstrated that the FV classifier performed no better than the OBSP 
classifier, a posteriori OSP classifier discussed in Chapter 8. 




creosote leaves 
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sagebrush 

Figure 11.3. Classification results of the FV, OSP and OBSP classifiers (from left to right) with M = 
[blackbmsh, creosote leaves, sagebrush, dry grass, red soil] 

In order to see how an LCMV classifier classifies targets of interest in a single 
image, Fig. 11.4 shows the classification results produced by WTACEM, MTCEM, 
BRLCMV and LCMV. These results were obtained by combining the three figures in 
Fig. 11.1 into a single figure where red, green, and blue are used to highlight the three 
target signatures, blackbrush, creosote leaves and sagebrush. The SCEM result is not 
shown in Fig. 11.4 because its figure plots did not show its advantage as demonstrated 
in Fig. 1 1 .9(e) where the mixed colors of an image pixel can really show the degrees of 
abundance fractions of targets detected in the pixel (see Fig. 11 .9(e)). 
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Figure 11.4. Single-plot classifications of the LCMV, BRLCMV, WTACEM and MTCEM classifiers with M 
= D = [blackbrush, creosote leaves, sagebrush] 

Similar experiments were also conducted for Fig. 11 .2 which included two undesired 
target signatures, red soil and dry grass, Fig. 11.5 shows the classification results 
produced by the LCMV and BRLCMV classifiers by merging the three target signatures 
detected in Fig. 1 1 .2 into one single figure, while eliminating background signatures. 
This resulting LCMV is actually the TCIM classifier. Since the WTACEM and MTCEM 
did not require knowledge of red soil and dry grass, their classification results in Fig. 
1 1 .4 remain unchanged and are not included in Fig. 11.5. 






(a) BRLCM\' 

Figure 11.5. Single plots classifications of BRLCMV and TCIM classifiers with M = [D U] = [blackbrush, 
creosote leaves, sagebrush, dry grass, red soil] 
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11.7 HYPERSPECTRAL IMAGE EXPERIMENTS 

Since there was no sample spectral correlation included in the conducted computer 
simulations, the results did not show the advantage of the sample correlation matrix 
R^^^ used in the LCMV and BRLCMV classifiers (see (11.15)). In order to see how 
much improvement resulting from inclusion of sample spectral correlation, we conducted 
real image experiments using the AVIRIS image in Fig. 1.6(a) and the 15-panel scene in 
Fig. 1.7 for the LCMV, the BRLCMV, the FV and the OSP classifiers. As noted in 
Fig. 11.3, the OSP classifier did not detect correct amounts of target abundance. 
Nevertheless, as demonstrated in Chapter 8, the OSP and a posteriori OSP classifiers 
did produce very close results in classification because their generated abundance 
fractional images were scaled to 256 gray levels. So, only the OSP classifier was used for 
comparison in the following hyperspectral image experiments. Fig. 11.6 shows the 
AVIRIS classification results of cinders, rhyolite, playa, shade and vegetation. 
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From Fig. 11.6, we can see that when the LCVF AVIRIS data were used for the 
experiments, the LCMV and the BRLCMV classifiers performed slightly better than did 
the FV and the OSP, particularly, vegetation and shade. However, when the 15-panel 
HYDICE scene was used for experiments, the classification results changed significantly. 
Fig. 11.7 shows their classification results of the 15 panels. Fig. 11.7 shows the LCMV 
and the BRLCMV classifiers performed significantly better than did the FV and the OSP 
classifiers for all the 15 panels, particularly, panels in rows 1 and 4. This is because the 
former used to account for spatial correlation while the latter discarding it by 

operating individual image pixel independently, i.e., Since the HYDICE 

image has 1.5-m spatial resolution compared to 20-m spatial resolution of the AVIRIS 
data. In this case, the sample spectral correlation plays a crucial role in classification and 
the LCMV and the BRLCMV classifiers certainly took advantage of it. 




laiUMV (clFV (d)OSP 
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Based on the ground truth map provided by Fig. 1.7(b) we can actually tally the 
number of target pixels that are correctly detected and classified by a particular method. 
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Since the LCMV classifiers only generate the abundance fractions of the specific targets 
to be classified and not for all the targets, the winner-take-all thresholding approach using 
the WTAMPCV specified by (9.2) in Chapter 9 is not applicable. In this case, the 
<3%MPCV specified by (9.3) in Chapter 9 was used to segment targets from the image 
background. Tables 11.1 (a-d)-l 1 .2(a-d) tabulate the detection and classification results for 
the four classifiers, LCMY, BRLCMV, FV and OSP where the 50%MPCV and 
25%MPCV were used to produce the following tables. 



Table 11.1(a). 





Nr 


Nrd 


Rrd 


Rc 


Nfa 


PI 


3 


2 


0.6667 


0.6667 


2 


P2 


4 


3 


0.7500 


0.7500 


2 


P3 


4 


3 


0.7500 


0.7500 


3 


P4 


4 


3 


0.7500 


0.7500 


3 


P5 


4 


3 


0.7500 


0.7500 


2 


Sum 


19 


14 


0.7368 


0.7368 


12 



Table 11.1(b). DetectiOTjin£^dassificadonj|ares^^ 





Nr 


Nrd 


Rrd 


Rc 


Nfa 


PI 


3 


2 


0.6667 


0.6667 


2 


P2 


4 


3 


0.7500 


0.7500 


2 


P3 


4 


3 


0.7500 


0.7500 


3 


P4 


4 


3 


0.7500 


0.7500 


3 


P5 


4 


3 


0.7500 


0.7500 


2 


Sum 


19 


14 


0.7368 


0.7368 


12 



Table 11.1(c). DetecUon^ai^^ckssificatioiijra^s^^JF^^ 





Nr 


Nrd 


Rrd 


Rc 


Nfa 


PI 


3 


0 




0.0000 


239 


P2 


4 


4 


1.0000 


1.0000 


3089 


P3 


4 


4 


1.0000 


1.0000 


8 


P4 


4 


4 


1.0000 


1.0000 


2033 


P5 


4 


4 


1.0000 


1.0000 


1232 


Sum 


19 


16 


1.0000 


1.0000 


6601 


Table 11.1(d). 


Detection and classification rates of OSP using 


50% abundance cut-off threshold 




Nr 


Nrd 


Rrd 


Rc 


Nfa 


PI 


3 


0 


0.0000 


0.0000 


242 


P2 


4 


4 


1.0000 


1.0000 


3736 


P3 


4 


4 




1.0000 


105 


P4 


4 


4 




1.0000 


1429 


P5 


4 


4 


1.0000 


1.0000 


1209 


Sum 


19 


16 


1.0000 


1.0000 


6721 



Table 11.2(a). Dete^imiand^cla^sification^rates^^^LCMV^^ 





Nr 


Nrd 


Rrd 


Rc 


Nfa 


PI 


3 


3 


1.0000 


1.0000 


8 


P2 


4 


4 


1.0000 




10 


P3 


4 


4 


1.0000 


1.0000 


5 


P4 


4 


4 


1.0000 


1 .0000 


5 


P5 


4 


4 


1.0000 




4 


Sum 


19 


19 


1.0000 




32 














TARGET SIGNATURE-CONSTRAINED MIXED PIXEL CLASSIFICATION 



221 








Nr 


Nrd 


Rrd 


Rc 


Nfa 


PI 


3 


3 


1.0000 


1.0000 


8 


P2 


4 


4 


1.0000 


1.0000 


10 


P3 


4 


4 


1.0000 


1.0000 


5 


P4 


4 


4 


1.0000 


1.0000 


5 


P5 


4 


4 


1.0000 


1.0000 


4 


Sum 


19 


19 


1.0000 


1.0000 


32 



Table 11.2(c). Detection and classification rates of FV using 25% abundance cut-off threshold 





Nr 


Nrd 


Rrd 


Rc 


Nfa 


PI 


3 


3 


1.0000 


1.0000 


3240 


P2 


4 


4 


1 .0000 


1.0000 


4011 


P3 


4 


4 


1.0000 


1.0000 


2519 


P4 


4 


4 


1.0000 


1.0000 


3963 


P5 


4 


4 


1.0000 


1.0000 


3950 


Sum 


19 


19 


1.0000 


1.0000 


17683 


Table 11.2(d). Detection and classification rates of OSP using 


25% abundance cut-off threshold 




Nr 


Nrd 


Rrd 


Rc 


Nfa 


PI 


3 


3 


1.0000 


1.0000 


3286 


P2 


4 


4 


1 .0000 


1.0000 


4034 


P3 


4 


4 


1.0000 


1.0000 


3150 


P4 


4 


4 


1.0000 


1.0000 


3896 


P5 


4 


4 


1.0000 


1.0000 


3968 


Sum 


19 


19 


1.0000 


1.0000 


18334 



In these tables, N is total number of panel pixels, NR(Pi) is the number of red panel 
pixels in row i, NruCPI) is the number of red panel pixels detected and classified correctly 
in row i, Np^ is the number of false alarm pixels for Pi, RpCPi) is the detection rate 
defined by R^„(Pi) = N^(Pi)/N^(Pi) for Pi and = E’.,(N^(Pi)/N)R^(Pi) is the 
classification rate for Pi. 

As shown in these tables, the FV and OSP classifiers achieved 100% detection rate 
and 100% classification rate at 25% while 0% for both detection and classification rates 
at 50%. On the other hand, both LCMV and BRLCMV classifiers performed identically 
at 50% and 25%. In both cases, they outperformed the FV and the OSP classifiers. 
Despite that all the four classifiers achieved 100% detection and classification rates, the 
number of false alarm pixels generated by the FV and OSP classifiers was significantly 
greater than (more than 500 times) those produced by the LCMV and BRLCMV 
classifiers. 

Analogous to Fig. 9.5, 3-D ROC curves and 2-D ROC curves can be plotted based 
on the target hit rate R„ defined by (9.27) versus the false alarm rate via the ^3%MPCV 
defined by (9.3) for performance evaluation. In this case, we compare the performance of 
all the five LCMV classifiers, LCMV, BRLCMV, WTACEM, SCEM, and MTCEM 
against that of the FV and OSP. Figs. 11.8(a-d) show the 3-D and 2-D ROC curves of 
the five LCMV-based classifiers, FV and OSP classifiers. As demonstrated in these 
figures, the ROC curves fall into two categories, one produced by the five LCMV-based 
classifiers whose ROC curves were very close, and another by the FV and OSP 
classifiers, which performed very closely but substantially worse than the LCMV-based 
classifiers. It should be noted that the 3-D ROC curves in Fig. 1 1.8(a) were generated by 
the a%MPCV with varying abundance percentage as a thresholding value from 100% 
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down to 0%. Their corresponding 2-D ROC curves were plotted based on the resulting 
target hit rate R„ versus the false alarm rate. 






Flgisre IIJ, 3-D E0€ and 2-0 mns of LCMV, EELCMV, WTACEM, SCEM, MTCIM, FV sM OSF 



As we can see from Fig. 11.8, the 3-D ROC curves in Fig. 11.8(a) show the 
performance of a classifier as a function of three parameters R„, Rp^,^3%. While the 2-D 
ROC curves of (R^, Rfa) in Fig. 11.8(b) provide the mean target hit rate of a classifier 
versus the mean false alarm rate, the 2-D curves of (Rh,^%) and (RpA,<2%) in Figs. 1 1.8(c- 
d) indicate how a threshold value oia% affects the performance of a classifier. Table 11.3 
shows the mean target hit rates by calculating the areas under their 2-D ROC curves in 
Fig. 11.8(b). 



Table 11.3. Detection rates produced by LCMV, BRLCMV, WTACEM, SCEM, MTCEM, FV and OSP 





MTCEM 


LCMV 


BRLCMV 


WTACEM 


SCEM 


OSP 


FV 


Rh 


0.7334 


0.7376 


0.7372 


0.7805 


0.7769 


0.4302 


0.4578 



It is interesting to note that in terms of detecting R and Y target pixels, WTACEM 
was the best among all the five classifiers, while the OSP classifier was worst. Also, 
according to Figs. 11.8(c), LCMV, BRLCMV, MTCEM, SCEM classifiers performed 
very similarly where their R„ dropped rapidly around a% = 11.8% compared to the 
WTACEM whose R„ dropped drastically around a% - 5%. On the other hand, the FV 
and the OSP classifiers performed very closely where their R^ decreased gradually with 
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two sudden drops occurred around 35% and 65%. This demonstrated that the of the 
five LCMV-based classifiers was sensitive to small ^z%’s and but their R„‘s dropped 
below 0.5 after 11.8%. So, the overall performance of these six classifiers was offset by 
aVo greater than 11.8%. However, this is not the case for the FV and the OSP classifiers 
whose overall performance was significantly offset by a% greater than 65%. Similar 
phenomena for Rp^ were also observed in the 2-D curves of (Rp^,a%) in Fig. 11.8(d). 
Since R^, and Rp^ perform against each other, a compromise can be obtained from the 2-D 
ROC curves of (R„,RpJ in Fig. 11.8(b). 

In order for the five LCMV-classifiers to classify the 15-panel HYDICE in a single 
image, five colors, red, green, blue, yellow and magenta were assigned to highlight the 
detected panels in rows 1-5 respectively. Fig. 11.9 shows that classification results 
produced by these five classifiers. As we can see, there is no visible difference among all 
the five color images. 




I..CMV cb) nm£MV IC) KITCEM WTACEM SCHKI 



I Clajisifkatjon of LCMV, BRLCMV, MTCHM, WTA(T?-M srsd bCBM a color a.ssig!irnem 

11.8 REAL-TIME IMPLEMENTATION FOR LCMV CLASSIFIERS 

As described in Chapter 4, the sample spectral correlation matrix R^^^ allows us to 
implement the LCMV classifier in real time via a QR-decomposition. Despite that the 
real time processing implemented in Chapter 4 was carried out on a line-by-line basis, it 
can be also implemented in a pixel-by-pixel manner where the causal information is 
updated pixel-by-pixel rather than line-by-line. So, as a complement to Chapter 4, we 
will implement the following experiments for the TCIM classifier in real time on a pixel- 
by-pixel basis. The hyperspectral images to be used for experiments are those in Figs. 
1.6 and 1.7. 

Example 11.4.1 (LCVF AVIRIS Image) 

The AVIRIS image in Fig. 1.6(a) contains five signatures of interest in these images 
are identified, "vegetation", "cinders", "shade", "playa (dry lakebed)" and "rhyolite". So, 
it requires five colors for target classification. In order to achieve the best possible visual 
discrimination, green, white, blue, olive and red were selected for vegetation, cinders, 
shade, playa and rhyolite respectively for visual discrimination. The choice of these 
colors is our preference, but can be arbitrary. To implement these five colors for the 
TCIM classifier, the required number of rows is 5. Since there are five targets, 5 column 
vectors are required. To implement the TCIM classifier, the constraint matrix C has size 
of k = p = 5 and was chosen to be the 5x5 identity matrix 
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0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


V 


V 


V 


Xrl 




green 


white 


blue 


olive 


red 



( 11 . 21 ) 



Each column vector with c.. = 1 and c.. = 0 for i ^ j is used 

to constrain one single specific target while nulling out all the other targets. As an 
example, we can set = 1 in c, to constrain d = [vegetation] in (11.21) to classify 
vegetation and c^. = 0 for 2 < j <5 in c, to null out U = [cinders,shade,playa, rhyolite] 

so that vegetation can be detected and classified by green while other targets will be 
eliminated at the same time. The real-time processing was carried out on a pixel-by-pixel 
basis in a top-to-bottom and left-to-right fashion. 

Fig 11.10 shows a sequence of images generated by a pixel-by-pixel real time 
process where the five targets were detected in images labeled by (b-f). It started with 
random colors with no target pixels detected in Fig. 11.10(a) until at pixel 188 in line 5 
where the first vegetation pixel was detected and was highlighted by green in Fig. 
11.10(b). This was followed by detecting the first shade pixel with blue at pixel 89 in 
line 6 in Fig. 11.10(c), the first cinders pixel with white at pixel 98 in line 10 in Fig. 
11.10(d), the first playa pixel with olive at pixel 187 in line 85 in Fig. 11.10(e) and the 
first rhyolite pixel with red at pixel 60 in line 110 in Fig. 1 1.10(f). Figs. 1 1.10(g) and 
11.10(h) show the color and gray scale images after the processing was completed. The 
color map in Fig. 11.10(g) offers an advantage that the gray scale image cannot provide, 
visual discrimination of different detected targets in one single image. Interestingly, there 
was an anomaly detected at pixel 182 in line 84 in Fig. 1 1 .10(f) and marked by a white 
circle at the top edge of the lake despite that we did not have knowledge about the 
anomaly. This anomaly was detected before the first playa pixel was detected in Fig. 
11.10(e). It a two-pixel anomaly and detected in both color and gray scale images in 
Figs. ll.lO(g-h), but is invisible in Fig. 3.1 by visual inspection. It was also detected in 
Fig. 6.1(c) by the RXD described in Chapter 6. Its detection was missed in Harsanyi and 
Chang (1994) because the used OSP classifier was supervised. As we examined its color 
closely, it seemed to be a mixture of white and olive. This indicated that the anomaly 
could be mixed by cinders and playa. Obviously, such information cannot be provided 
by the gray scale image in Fig. 1 1.10(h). 



no tuntcl dctcclcd T* delcwted vegetation pixel 1" dciccUd .xhade pixel 1 detccled ciodean pixel 

i Imc 4 at pixel I Kol (line 5 at pixel (line P at pixel H9) (line 10 at pixel 9S) 

(a) (b) (c) (d) 
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Example 1 1..4.2 (15-Panel HYDICE semc) 

As shown in Fig. 1.7, there are 15 panels in the image scene where the panels in the 
same row made from the same material. So, there are five distinct materials, which 
require five different colors for panel classification. Following a similar experiment 
conducted in Example 11.4.1, red, green, white, yellow and magenta were selected to 
specify panels from row 1 to row 5 respectively to achieve the best possible visual 
discrimination among these 15 panels. To implement these five colors for the TCIM 
classifier, the required number of rows is 5. On the other hand, to classify the 15 panels 
into 5 distinct material classes, the required number of columns must be 5. This results 
in a constraint matrix C with size of ^ = m = 5 specified by the 5x5 identity matrix 



r r T r 7 t 1 

C =[c^ C, C 3 C^C 3 j 



1 0 0 0 0 

0 10 0 0 

0 0 10 0 

0 0 0 1 0 

0 0 0 0 1 

V V V V V 

red green white yellow magenta 



( 11 . 22 ) 



The constraint vectors, Cj,c^,C 3 ,c^,C 3 in C are used to detect and classify the 15 panels. 
Each column vector c. = ,c. 3 ,c.^ ,^ 5 )^ with c..=l and c..~0 for 

corresponds to one constraint vector in (11.1) with d = Py and U = For 

example, in order to detect and classify the panels in row 1 , we set =1 in Cj to 
constrain d = [PI] and c^. =0 for 2 < y < 5 in Cj to null out U = [P2 P3 P4 P5] so 

that panels in row 1 can be detected and classified by red while panels in other rows will 
be eliminated at the same time. The real-time processing was carried out on a pixel-by- 
pixel basis in a top-to-bottom and left-to-right fashion. Fig. ll.ll(a)-ll.ll(o) shows a 
sequence of images resulting from implementing the TCIM classifier in a pixel-by-pixel 
real time process with the constraint matrix C specified by (1 1.21). 

In the beginning of data processing, no panels were detected. So, random colors were 
shown in the image in Fig. 1 1.1 1(a). As soon as the process detected the first target, i.e., 
the panel p^j in row 1 , the detected pjj pixel was immediately changed to red while the 
background was suddenly changed to dark where the red color was chosen to specify the 
panels in row 1. Then the process continued to detect the panels in row 1 with red, e.g. 
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p,, in Fig. 11. 11(c) until the process detected the first panel pixel, p,j in row 2 in Fig. 
11.11 (d) where the pre-assigned green color was used to highlight the panel p^^ and 
discriminate it from the red pixels that were panel pixels in row 1 . Unfortunately, it 
missed the detection of the panel pixel, P13 due to its very small amount of abundance. 
Figs, ll.ll(e-o) how the real time process detected target panel pixel by pixel, P23 
with green, P31, P32, P33 with white, p^j, p^^? P435 with yellow, P51, P52, V53 with 
magenta in sequence. Finally, Fig. ll.ll(p) shows a complete detection. At the time 
when the last pixel in the image was completed, the classification process was also 
accomplished and all the 1 5 panels were successfully classified by five colors, red for the 
panels in row 1 , green for the panels in row 2 , white for the panels in row 3 , yellow for 
the panels in row 4 and magenta for the panels in row 5 . In order to demonstrate how 
important to use a color map to visualize the classification result in Fig. ll.ll(p). Fig. 
ll.ll(q) shows only the gray scale image with all the detected 15 panels. As we can 
from Fig. 1 1 . 1 1 (g), there is no way to discriminate the panels of one row from the panels 
of another row. 
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Wlgmt ||»IL ReaLtitxje pixeLby-pi?t€i implcniostatiosi of thin 1 5- panel HYDIC.E .scene 

Several experiments were also conducted by including some partial knowledge in the 
TCIM-based target classifier, for instance, grass and tree in the image scene. These two 
natural background signatures can be used as undesired targets for background 
annihilation. The results are not included here since there was not much improvement. 
This is because there are large fields of trees and grass in the image scene. Two 
background signatures were not sufficient to represent the image background. 



11.9 CONCLUSIONS 

This chapter presents several versions of LCMV classifiers, which can detect and 
classify multiple targets of distinct types in a single image by a single operation. More 
importantly, the LCMV classifiers can be implemented by a real-time process on a pixel- 
by-pixel basis. It expands the capability of the LCMV detectors described in Chapter 4 to 
LCMV multiple-target classifiers. The LCMV multiple-target classifiers use multiple 
constraints to accommodate similar target signatures to ease the information sensitivity 
as well as to recover missing target pixels. In addition, the LCMV multiple-target 
classifiers also take advantage of multiple constraints as a color assignment mechanism 
so that all the detected targets can be displayed in one single image with different colors. 
Although the LCMV classifiers studied in this chapter is supervised which require target 
knowledge, they can be further extended to unsupervised classifiers by including 
unsupervised algorithms in Chapter 5 or anomaly detectors described in Chapter 6. This 
will be discussed in Chapters 13 and 14. Finally, Table 11.4 summarizes assumptions 
and constraints that are imposed in each of the methods presented in this paper where the 
CEM has three versions, WTACEM, SCEM, MTCEM when it is implemented as a 
classifier. 



Table 11.4. . 





linear mixture 
model 


desired 
targets (D) 


undesired 
targets (U) 


filter 
vector w 


Rixi 


background 


LCMV 


no/no 


yes/yes 


yes/yes 


no/no 


yes/no 


no/no 


BRLCMV 


no/no 


yes/yes 


yes/yes 


no/no 


yes/no 


no/yes 


WTACEM 


no/no 


yes/yes 


no/no 


no/no 


yes/no 


no/no 


SCEM 


no/no 


yes/yes 


no/no 


no/no 


yes/no 


no/no 


MTCEM 


no/no 


yes/yes 


no/no 


no/no 


yes/no 


no/no 


TCIMF 


no/no 


yes/yes 


yes/yes 


no/no 


yes/no 


no/no 


FV 


yes/yes 


yes/yes 


no/no 


no/yes 


no/no 


no/no 


OSP 


yes/no 


yes/no 


no/no 


no/no 


no/no 


no/no 



For example, a “yes/no” at the row of the OSP crossed with the column of the linear 
mixture model in Table 11.4 means that the OSP assumes a linear mixture model but 
does not impose constraints on this model. 
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TARGET SIGNATURE-CONSTRAINED MIXED PIXEL 
CLASSIFICATION (TSCMPC): LINEARLY 
CONSTRAINED DISCRIMINANT 
ANALYSIS (LCD A) 



Fisher's Linear Discriminant Analysis (LDA) is a widely used technique for pattern 
classification and was discussed in Chapter 9. It uses Fisher's ratio, a ratio of between- 
class scatter matrix to within-class scatter matrix as an optimal criterion to derive a set of 
feature vectors by which high dimensional data can be projected onto a low-dimensional 
feature space in the sense of maximizing class separability. This chapter presents an 
approach derived from Fisher's LDA, referred to as linear constrained distance-based 
discriminant analysis (LCD A), that uses a similar criterion to Fisher's ratio for 
classification. It maximizes the ratio of inter-distance between classes to intra-distance 
within classes while imposing a constraint that all the class means must be aligned along 
predetermined directions. Interestingly, the LCD A classifier operates at the same 
operational form as does OSP in Chapter 8, but achieves better classification resulting 
from the use of target signature constraints. Because of that, LCD A can be viewed as a 
constrained version of OSP. 



12.1 INTRODUCTION 

Many mixed pixel classification (MPC) methods have been proposed in the past 
such as linear unmixing methods described in Chapter 8. One of principal differences 
between pure and mixed pixel classification is that the former is a class-labeling process, 
whereas the latter is actually signature abundance estimation. In Chapter 9 we have seen 
that MPC generally performed better than the pure pixel classification (PPC) where the 
estimated target signature abundance fractions were used for classification. Additionally, 
as shown in Chapter 9, if MPC was converted to PPC, Fisher’s linear discriminant 
analysis (LDA) was among the best. This suggested that applying pure pixel-based LDA 
directly to mixed pixel classification problems may not be an effective way to best utilize 
Fisher's LDA. In magnetic resonance imaging (MRI) applications (Soltanian-Zadeh et 
al, 1996), Soltanian-Zadeh et al. recently developed a constrained criterion to 
characterize brain issues for 3-D feature representation. They used the ratio of the inter- set 
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distance (lED) to intra-set distance (lAD) (Ton and Gonzalez, 1974) as an optimal 
criterion while imposing a constraint that each class center must be aligned along 
predeterimined directions. In order to arrive at an analytical solution, Soltanian-Zadeh et 
al. further made an assumption of white Gaussian noise so that lED can be further 
simplified to a constant. As a result, maximizing the ratio of lED to I AD is equivalent to 
minimizing lAD. However, the white Gaussian noise assumption may not be valid in 
hyperspectral images since it has demonstrated (Chang et al., 1998, Du and Chang, 
1999) that unknown interferers in hyperspectral imagery such as background signatures, 
clutters were more severe and dominant than noise and they are generally non-Gaussian 
and non-stationary. This chapter presents a linear constrained distance-based discriminant 
analysis (LCDA) for hyperspectral image classification. It implements a criterion similar 
to Fisher's ratio along with a constraint that the vector directions (or spectral shapes) of 
targets of interest must be aligned with a set of specific directions. The idea is derived 
from modifying Soltanian-Zadeh et al.'s approach, but does not require the Gaussian 
assumption. As a matter of fact, the white Gaussian assumption is not necessary and can 
be taken care of by a whitening process. This is because minimizing LAD is equivalent to 
minimizing the trace of the data sample covariance matrix K and a whitening process can 
be applied to decorrelate K without making white Gaussian assumption. Furthermore, 
after the whitening processing is completed, LCDA can be carried out by OSP in Chapter 
8. As a result of using direction constraints, LCDA performs significantly better than 
OSP. From this point of view, LCDA can be thought of as a constrained version of 
OSP. 

Recently, a similar approach, called filter vectors (FV), was also proposed by 
Bowles et al (1995) and discussed in Chapter 11 (Section 11.3). It assumed that there 
was a set of targets of interest present in an image scene. These targets were then used to 
form a linear mixture model described by (3.1). A set of filter vectors was obtained by 
solving a constrained optimization problem with a constraint imposed on the targets of 
interest in such a way that their target signatures are orthogonal each other. Finally, each 
of the resulting filter vectors was used to classify a specific target. Interestingly, if a 
whitening process is included in LCDA prior to classification, Bowles et al's FV 
approach can be considered as a special case of LCDA with no need of using the linear 
mixture model. 

Of particular interest is a comparative analysis between LCDA and LCMV, both of 
which impose a constraint on target signature constraint to improve classification 
performance. The advantage of LCMV over LCDA is that LCMV only requires 
knowledge about targets of interest including desired and undesired targets. By contrast, 
LCDA requires a complete knowledge of target signatures of interest, but in this case, it 
performs more effectively than does LCMV, specifically for the case that two targets have 
very similar signatures. 

It should be noted that the idea of using signal direction constraints is not new and 
has been applied to various applications. Minimum Variance Distortionless Response 
(MVDR) beamformer in array processing (Van Veen and Buckley, 1988, Haykin, 1996), 
chemical remote sensing (Althouse and Chang, 1991) and LCMV in Chapters 4 and 11. 



12.2 LCDA 

In Fisher's discriminant analysis, the discriminant vectors generally impose no 
constraints on their directions. However, in many practical applications, there may exist 
some partial knowledge that can be useful to constrain desired features along certain 
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directions to minimize the effects of undesired features. As discussed in Chapters 4 and 
11, LCMV constrains a set of desired target signatures while minimizing its output 
energy. In MRI (Soltanian-Zadeh et al, 1996), Soltanian-Zadeh et al. constrained normal 
tissues along prespecified target positions so as to achieve a better clustering for 
visualization. In this section, we follow a similar idea to derive a linear constrained 
discriminant analysis (LCDA) for hyperspectral image classification. 

Let b denote a linear transformation to project a high dimensional data space into a 
low dimensional space. Suppose that there are c classes of interest. The ratio of the 
average between-class distance to average within-class distance via the (]) is given by 






1 

— .. 




[l/(ciV)] 


I 







( 12 . 1 ) 



where the global mean p and local mean p,. of the z-th class were defined in the Section 
9.2.3. Suppose that there are p classes of interest with p < c , denoted by {c We 
can maximize (12.1) over ([) subject to a constraint that the desired class means m:, 
must be aligned along with the prespecified target directions In other words, 

what we are interested here is to find an optimal linear transformation (j)* , which 
maximizes 



7(0) subject to m. = 0(p,) for all i with 1 < z < p < c . (12.2) 

Here the target directions can be interpreted as spectral shapes of target 

signatures as we did for LCMV in Chapters 4 and 11. For example, = 5.. implies 

that the directions of target signatures, m. and m^, are orthogonal. It also implies that 

the spectral shapes of target signatures are mutually orthogonal. (12.2) outlines a general 
constrained optimization problem, which can be solved numerically. The linear 
transformation b in (12.2) can be expressed a matrix transform via a weight matrix 
W,,, ^ by 

y = = (12.3) 

Now, assume that p = c and m. in (12.2) is a pxl unit column vector with one in 
the z~th component and zeros in all other components that is specified by 
= (0---010---0)^, m, is a vector along the z-th coordinate in the space . If we 

j 

further assume that in represented by be the linear transformation b also 

linearly independent, then (12.2) becomes 
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y( subject to w^^jLi^. = 5.. for l<i,j < p 



(12.4) 



where S.. is Kronecker's notation given by S.. - 1 if z and S.. = 0 if i ^ j . 
The numerator and denominator of /((j)) in (12.1) can be further reduced to 



{2 / [pip - i)]}[5:,'L.s;.„.||(*(n,) - 
= {2/[p(p-i)]}{s,'i,i:;=,,.[2]} = 2 

[i/(p/v)]s,'L.[i;;:.||0(x;)-</>(n,)f] 



Since the numerator in (12.1) becomes a constant given by (12.5), maximizing (12.1) is 
reduced to minimizing (12.6). More specifically, maximizing (12.4) is reduced to 



min trace 

w 





,(x;-fx,)(x; 




subject to w^p. = S... (12.7) 



It should be noted that the term in the brackets in (12.7) turns out to be the within-class 
scatter matrix in Section 9.2.3. According to Fisher's LDA minimizing is 
equivalent to minimizing the total scatter matrix S^. If we let K denote the sample 
covariance matrix of all training samples, then and K are related by the following 
equation 



S^ = N-K. (12.8) 

Since N is the total number of samples and is a constant, substituting (12.8) into (12.7) 
results in the following equivalent problem by 

min trace (W^KW) subject to . = 5^ . (12.9) 

W ^ ' 



Equation (12.9) is now reduced to a simple problem, that is, how to find a matrix A, 
which decorrelates and whitens the covariance matrix K into an identity matrix. Let A be 
such a matrix. (12.9) can be whitened and further reduced to a simple optimization 
problem given by 

min |Ef= I w^^w.j = min{Sf=i|| ||"| subject to = <5... (12.10) 
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where are the vectors resulting from the whitening process using A, i.e., 

ji. == Since II wJI" is nonnegative for each \ <i<p, solving (12.10) is equivalent 

to solving 



min subject to = 5.. for each i with 1 < / < p. (12.1 1) 

The solution w* to (12.1 1) can then be obtained analytically by 

w>AX (12.12) 

where 

=I-U(U^U)'^U" (12.13) 

and U is the space linearly spanned by all cluster centers but p, , i.e. U = ({Py}y=i y^y) • 
Surprisingly, the solution specified by (12.12) turns out to be OSP in Chapter 8. This 
implies that the classifier w* = derived by LCDA is essentially a constrained 

version of OSP. 

A comment on LCDA is noteworthy. Despite the fact that two criteria used in LDA 
and LCDA look similar where both calculate a ratio of a measure of between-class to a 
measure of within-class, there are differences between LDA and LCDA. The LDA 
maximizes the between-class scatter matrix while LCDA minimizing the intra-distance 
within classes. Another difference is that the number of the discriminant vectors resulting 
from the LDA is one less than the total number of classes of interest, p, whereas the 
number of projection vectors generated by LCDA is the same number of classes of 
interest, p. Most significantly, LCDA can be regarded as a variant of OSP, but LDA 
cannot. 



12.3 WHITENING PROCESS FOR LCDA 

In order to reduce the optimization problem given by (12.9) to the one specified by 
(12.10), Soltanian-Zadeh et al. (1996) made the white Gaussian noise assumption for 
MRI to arrive at (12.10). As a matter of fact, this assumption can be removed by finding 
a whitening matrix A for (12.9) to derive (12.10). 

Assume that ^ are the eigenvalues of the sample covariance matrix K or the 

total scatter matrix S.^ and {v.}^ ^ are their corresponding normalized unit eigenvectors. 
Since S^ or K is nonnegative definite, all eigenvalues are nonnegative. There exists a 
unitary matrix Q such that Z can be expressed by 



Q^KQ = A 



(12.14) 
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where Q = [v, v, is an eigenvector matrix made up of all eigenvectors {v.}. ^ and 

A = diag[X.Y.^^ is a diagonal matrix with as diagonal elements. 

If we let A"‘^“ = diag^-yj^.^ , multiplying both sides of (12.14) by A“‘^' results in 

A''^-Q"KQA“^^“ = I (12.15) 

From (12.15), we obtain the desired matrix A for (12.9), which is given by 

A = QA~^'^ (12.16) 

so that A^KA = 1. 

Using (12.12) and (12.16) the linearly constrained Euclidean distance-based 
discriminant analysis optimization problem specified by (12.2) can be solved as follows: 

LCD A Algorithm 

1. Find the cluster centers of /^-classes and the sample covariance matrix K. 

2. Find the eigenvalues and their corresponding eigenvectors {v.} ^ of K to 

form the unitary matrix Q = [v^ and the diagonal matrix A = diag{X 

3. Form the desired matrix A = QA~'^“ using (12.16). 

4. Find the A-transformed cluster means p. = A^p. for each i with \<i<p. 

5. Find w* = for each i with 1 < i < p with given by (12.13). This step is 
the classification step where w* is used to classify data samples into the f-th class. 



12.4 BOWLES ET AL.'S FILTER VECTORS (FV) ALGORITHM 

In Section 11.3, we have shown that Bowles et al.'s FV algorithm (Bowles et al., 
1995) was a special case of the BRLCMV classifier. In this section, we will further show 
that Bowles et al.'s FV algorithm can be also interpreted as a special case of LCDA. 

First of all, recalling (3.1), a linear mixture of p target signatures m^, m,, in 

an image pixel vector r can be expressed by 

r = Ma + n (3.1) 



The first condition imposed on (3.1) by Bowles et al.'s filter vectors, {w.}^ ^ is (11.5) 
which says that a filter vector, w. must be orthogonal to all the signatures j 



except the one, that it used for training. This condition is exactly the same constraint 

imposed by LCDA, (12.4). The second condition proposed by Bowles et al.’s filter 
vectors is (11.6), which states that the sum of all the components in each filter vector 
must be zero. This implies that mean of W. must be zero. The Bowles et al. third 
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condition, (11.7) that minimizes the vector length of each of filter vectors is identical to 
the objective function in (12.11). Despite that there was no explicit rationale provided in 
Bowles' et al. (1995) we can derive that the need of these conditions actually arises from 
the following facts. 

Let y. denote the output of the filter vector w. operating on r. Then 
y. = w. • r = w^r = w^(Ma + n) = 

== + wm (12.17) 

= + wm = a. + w^n 

and its output energy is given by 

= a] + 2a.^[w|^nj + jw_ 

Equation (12.17) implies that in order for the filter output y. to accurately approximate 
a. , it requires that w, - n = w^n be zero. On the other hand, (12.18) indicates that in 
order for output y. to accurately approximate a . , we must minimize 
2a.£'[w^^nj + jw.. One way to achieve these two goals is to make the 

assumption that the noise in model (3.1) is a zero-mean random vector process, i.e., 

E[n] = 0 (12.19) 

so that = 0 . By taking expectation of y. and using (12.19), 

(12.17) becomes 

y. = w^r = a. . (12.20) 

Assuming the noise variance given by a' and taking expectation of yf, (12.18) is 
reduced to 



= E + w^nj 



y] = a: -I- a'w'^w. (12.21) 

which turns out to be exactly the same second condition given in Pesses (1999). Both 
(12.19) and (12.20) suggest that in order for the filter output y. = w^r to accurately 
represent the abundance fraction of m., a , we must minimize in (12.21) with 

respect to w. for each 1 < / < p . As discussed in Section 11.3, the objective of Bowles 
et al.'s second condition, i.e., w. -1^^^ = = 0, is to remove the effects caused by 
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image background, not make w, - n = wm ~ 0 as proposed by (12.19). However, as 
shown in (12.20), if the noise is assumed to be a zero-mean white random vector process 
specified by (12.19), y. = . Therefore, if we replace Bowles et al.'s second 

condition, w. • =0 given by (11.6) with (12.19) which can be easily 

satisfied by a whitening process described in Section 12.3, Bowles et al's filter vectors 
become the ones that minimize (12.21). This is equivalent to solving exactly the same 
constrained optimization problem for LCDA that is specified by (12.11). With this 
interpretation, Bowles et al.'s filter vectors method essentially performs as a different 
version of LCDA with zero-mean noise process. 



12.5 COMPUTER SIMULATIONS AND HYPERSPECTRAL IMAGE 
EXPERIMENTS 



In order to evaluate the performance of LCDA, we conduct real hyperspectral image 
experiments in comparison with the results produced by OSP, LDA and LCMV. 

Example 12.1 (Computer Simulations) 



The following computer simulations are a follow-up example of Section 11.6 
considered in Chapter 11. The same 250 simulated mixed pixels were also used in this 
example. Fig. 12.1 is the result detection and classification results produced by the 
LCDA, the FV and the OSP classifiers (from left to right) with the target signature 
matrix M = [blackbrush, creosote leaves, sagebrush]. It should be noted that the figures 
in Fig. 12.1 used different scales to show their ability in classification. Apparently, 
LCDA performed significantly better than did FV and OSP where the LCDA classifier 
correctly detected and classified most target pixels while the FV and the OSP classifiers 
failed to detect all target pixels. The reason for this is because the three target signatures, 
blackbrush, creosote leaves and sagebrush have very similar signatures according to 
Tables 2. 3 -2. 6. In this case, LCDA shows its strength in classification by constraining 
their target signature directions. It is worth noting that none of these three classifiers 
detected correct amounts of target abundance fractions. If we further compare the results 
in Fig. 12.1 to the results in Fig. 11.1, LCMV and BRLCMV are the best classifiers 
among these five classifiers. 















idi. 
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(c) sagebrush 

Figure 12.1. Detection and classification results of the LCDA, FV and OSP classifiers (from left to right) 
with M = [blackbrush, creosote leaves, sagebrush]. 

Similar computer simulations to that in Fig. 1 1 .2 were also conducted using the 
target signature matrix M = [blackbrush, creosote leaves, sagebrush, dry grass, red soil] 
where the red soil and dry grass were simulated as background signatures. The 
classification results of the LCDA, the FV and the OSP classifiers are shown in Fig. 
12.2. It is interesting to note that the performance of all the three classifiers was 
improved, specifically, the FV and the OSP classifiers. The LCDA classifier still 
performed slightly better than the FV and the OSP classifier. This experiment shows that 
when background signatures are distinct from the target signatures, they actually may 
help the classifier improve their performance. 




(a) blackbrush 




(b) creosote leaves 




(c) sagebrush 

Figure 12.2, Detection and classification results of the LCDA, FV and OSP classifiers (from left to right) 
with M = [blackbrush, creosote leaves, sagebrush, dry grass, red soil]. 

Once again, we compare the results in Fig. 12.2 to that in Fig. 11.2, all the five 
classifiers performed comparably and differently. For the case of blackbmsh, all the five 
performed well. For cases of creosote leaves and sagebrush, all the five had difficulty 
with detecting pixels of creosote leaves and sagebrush with abundance fractions 20% and 
40%. 
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Example 12.2 (LCVF AVIRIS Image) 

In this example, the AVIRIS data in Fig. 1.6(a) were used for experiments. In this 
case, p = 5 and d = 224, which is the total number of bands. LCDA imposed a 
constraint that forces these five target signatures along five orthogonal directions. Fig. 
12.3 shows results with the images in the first, second, third and fourth columns were 
produced by the LCDA, the LDA, the FV and the OSP classifiers respectively. 




column: OSP I..CDA. L-DA, FV und OSP 



As we can see from Fig. 12.3, the results produced by the LCDA performed better 
than the LDA, FV and OSP classifiers in detection and classification of all five targets, 
specifically, in classifying cinders, vegetation and shade. It should be noted that LDA 
performed as a pure pixel classifier rather than a mixed pixel classifier. As a result, the 
images produced by LDA are binary as opposed to gray scale abundance fractional images 
produced by the LCDA, the FV and the OSP classifiers. Therefore, it is not surprising to 
see that LDA produced the worst results. 
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Example 12.3 (15-Panel HYDICE scene) 

As another example, we conducted experiments for LCD A using the 15-panel 
HYDICE scene in Fig. 1 .7(a) where the target spectral signature of the panels in row i is 
represented by Pz shown in Fig. 1.8. The results are shown in Fig. 12.4 where the results 
of the EDA, FV and OSP classifiers are also included for comparison. 







Interestingly, LCDA and EDA performed very similarly and both outperformed the 
FV and the OSP classifiers. Eike Tables 10.1-10.3 we can also calculate the detection 
and classification rates for ECDA by tallying the number of target pixels that were 
correctly detected and classified. Tables 12.1-12.4 were detection results obtained by the 
EDA and the ECDA classifiers using the 50%MPCV, 25%MPCV and 20%MPCV 
respectively. 
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Table 12.1. Detection and classification rates of LDA 





Nk 


Nrd 


Rd 


Rc 


Npa 


PI 


3 


2 


1.0000 


1.0000 


9 


P2 


4 


3 


1.0000 


1.0000 


3 


P3 


4 


3 


1.0000 


1.0000 


4 


P4 


4 


3 


1 .0000 


1.0000 


4 


P5 


4 


4 


1 .0000 


1.0000 


5 


Total 


19 


14 


0.7368 


0.7368 


25 


Table 12.2. 


Detection and classification rates of LCDA using 50% abundance as the cut-off threshold 




N. 


Nrd 


Rd 


Rc 


Npa 


PI 


3 


2 


0.6667 


0.6667 


0 


P2 


4 


3 


0.7500 


0.7500 


0 


P3 


4 


3 


0.7500 


0.7500 


0 


P4 


4 


3 


0.7500 


0.7500 


0 


P5 


4 


4 


1.0000 


1.0000 


0 


Total 


19 


14 


0.7368 


0.7368 


0 



Table 12.3. 


Detection and classification rates of LCDA using 25% abundance as the cut-off threshold 




Nr 


Nrd 


Rd 


Rc 


Npa 


PI 


3 


2 


0.6667 


0.6667 


0 


P2 


4 


4 


1.0000 


1.0000 


0 


P3 


4 


4 


1.0000 


1.0000 


0 


P4 


4 


4 


1.0000 


1.0000 


0 


P5 


4 


3 


0.7500 


0.7500 


0 


Total 


19 


17 


0.8947 


0.8947 


0 


Table 12.4. Detection and classification rates of LCDA using 20% abundance as the cut-off threshold 




Nr 


Nrd 


Rd 


Rc 


Npa 


PI 


3 


3 


1.0000 


1.0000 


0 


P2 


4 


4 


1.0000 


1.0000 


0 


P3 


4 


4 


1 .0000 


1 .0000 


0 


P4 


4 


4 


1.0000 


1 .0000 


0 


P5 


4 


4 


1.0000 


1.0000 


0 


Total 


19 


19 


1.0000 


1.0000 


0 



As shown in these tables, the LDA classifier performed very well and achieved 100% 
detection and classification rates. However, it also produced many false alarm pixels, 25 
pixels in total. On the other hand, the performance of the LCD A classifier was 
determined by the cut-off threshold value. The detection and classification rates were 
improved as the cut-off threshold was reduced from 25% to 20% while the number of 
false alarm pixels, N^a remaining zero. In this case, the LCDA classifier performed better 
than the LDA classifier. Compared to Tables 4. 1-4.3 and 1.1-11.3, the LCDA classifier 
also performed as well as did TCIMF in detection of the 15 panels, but did better than 
TCIMF with no false alarm pixels. 

Analogous to Fig. 11.8, 3-D ROC curves and 2-D ROC curves can be plotted based 
on the target hit rate R^, defined by (9.27) versus the false alarm rate via the aVoMVCW 
defined by (9.3) for performance evaluation. Fig. 12.5 shows the 3-D and 2-D ROC 
curves generated by LCDA, FV and OSP. As we can see, LCDA performed significantly 
better than did FV and OSP. 
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ia) 3-D ROC curves |b) 2T> ROC curves offR^IC,) 

Figure 12*5, ROC cur’cs of I,CD/C FV md OSF 



Table 12.5 tabulates the target hit rates by calculating the areas under their 2-D ROC 
curves in Fig. 12.5(b). 



Ta ble 12.5. Detection rates produced by LCDA, FV and OSP 





LCDA 


OSP 


FV 


Rh 


0.7548 


0.4302 


0.4578 



12.6 CONCLUSIONS 

Fisher’s linear discriminant analysis (LDA) has been well accepted as a major 
technique in pattern classification. It can be also applied to hyperspectral image 
classification. This chapter presents a similar but different approach to LDA, called 
LCDA, which replaces Fisher's ratio with the ratio of inter-distance to intra-distance as a 
criterion for optimality. The advantage of LCDA over LDA is that it constrains the class 
means along the desired orthogonal directions. Consequently, all the classes of interest 
are forced to separate along with prescribed directions. By means of this direction 
constraint LCDA can detect and classify similar targets. Interestingly, it also 
demonstrates that the Bowles et al.’s filter-vectors approach that was shown to be a 
special case of the BRLCMV classifier in Chapter 1 1 can be also interpreted as a special 
case of LCDA. In analogy with Table 11.4, Table 12.6 summarizes assumptions and 
constraints that are imposed by LCDA in comparison with FV and OSP. 



Table 12.6. Assumptions/constraints imposed by LCDA, FV and OSP 





linear mixture 
model 


desired 
targets (D) 


undesired 
targets (U) 


filter vector w 




background 


LCDA 


no/no 


yes/yes 


no/no 


no/no 


yes/no 


no/no 


FV 


yes/yes 


yes/yes 


no/no 


no/yes 


no/no 


no/no 


OSP 


yes/no 


yes/no 


no/no 


no/no 


no/no 


no/no 



A comparative study among the five classifiers, LCMV, BRLCMV, MTCEM, 
SCEM, WTACEM discussed in Chapter 1 1 and the three classifiers described in this 
chapter, LCDA, FV and OSP was also reported in Chang (2002b) where the eight 
classifiers were studied in a unified framework. 

Finally, it is also worth noting that LCDA can be extended to an unsupervised 
method for unknown image scenes using unsupervised algorithms in Chapter 5 when no 
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a priori signature knowledge is available. The experimental results are very impressive 
and almost as good as their supervised counterparts. 




V 



AUTOMATIC MIXED PIXEL CLASSIFICATION 

(AMPC) 



In PART III and PART IV, we considered the problems of unconstrained mixed 
pixel classification and constrained mixed pixel classification. In PART V, we will 
further consider the problem of automatic mixed pixel classification (AMPC). Like 
automatic subpixel detection studied in Chapter 5, there are also two types of AMPC, 
unsupervised MPC and automatic target detection and classification (ATDC). The former 
is applied to the case that when there is a lack of target knowledge, the three 
unsupervised algorithms described in Sections 5. 2-5. 4 of Chapter 5 can be used to 
generate the required unsupervised target knowledge to extend the supervised MPC 
described in Chapters 7-11 to unsupervised MPC. The latter is similar to anomaly 
detection in Chapter 5 and does not require unsupervised target knowledge for 
classification. Rather, it detects and classifies anomalies in an unknown image scene. 
Therefore, technically speaking, unsupervised MPC and ATDC are considered to be 
different approaches. In Chapter 13, we first investigate the unconstrained MPC. It 
derives an unsupervised OSP approach to unsupervised MPC. This method can be 
implemented in two different applications. One is the desired target detection and 
classification algorithm (DTDCA) that can be used to detect and classify specific targets 
provided by prior target knowledge. Another is called automatic target detection and 
classification algorithm (ATDCA) that can be used to detect and classify unknown 
targets. In Chapter 14, another type of AMPC, ATDC is considered. It extends anomaly 
detection to anomaly classification. Although a simple approach combining a classifier 
with an anomaly detector may be desirable, two issues need to be addressed before doing 
so. Since an anomaly detector does not discriminate anomalies it detects, we need a 
measure to discern the detected anomalies. Second, it requires target information for a 
classifier to be effective. In order to resolve these dilemmas, four target discrimination 
measures are introduced in this chapter. These measures cluster the detected anomalies 
into different targets classes in an unsupervised manner. Then the means of each clustered 
target class are calculated to generate the desired target information for classification. By 
far, all the techniques presented in PART II-PART V assume that the target abundance 
fractions are unknown constant, i.e., nonrandom parameters. Apparently, this does not 
have to be the case in general, specifically for hyperspectral imagery, which may uncover 
many unknown random signal sources. In Chapter 15, an approach to linear spectral 
random mixture analysis (LSRMA) is proposed where targets of interest are modeled as 
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random signal sources. It is based on the concept of independent component analysis 
(ICA) which has shown great success in blind source separation in signal process. This 
chapter provides a promising application of the ICA to hyperspectral image analysis. 
Chapter 16 presents a similar idea used in the LSRMA but a different approach, called 
projection pursuit (PP), which can be considered a general version of the LSRMA. It 
designs a projection index to explore interesting stuctures of image data. When the 
projection index is the data variance, the PP becomes the well-known principal 
components analysis (PC A). On the other hand, if the projection index is a measure of 
statistical independence among projection images, the PP is reduced to the LSRMA 
where each projection image corresponds to an independent component. One of most 
challenging problems encountered in AMPC is the estimation of the intrinsic 
dimensionality of the data, which is generally unknown. The same problem was also 
encountered in automatic subpixel target detection. Chapter 17 is devoted to this topic 
where three Neyman-Pearson detection theory-based eigenvalue -thresholding techniques 
are developed for estimation of the intrinsic dimensionality. 




13 



AUTOMATIC MIXED PIXEL CLASSIFICATION 
(AMPC): UNSUPERVISED MIXED PIXEL 

CLASSIFICATION 



The automatic mixed pixel classification (AMPC) considered in this chapter is fully 
computer automated and can be implemented to automatically detect and classify targets 
with no human intervention. Like the automatic subpixel detection discussed in Chapters 
5-6 AMPC can be also categorized into unsupervised mixed pixel classification and 
anomaly classification. The former classifies mixed pixels in an unsupervised manner, 
where the required unsupervised target knowledge is the a posteriori target information 
generated directly from the image data as noted in Chapter 5. By contrast, the latter 
extends anomaly detection to anomaly classification, in which case the detected 
anomalies can be classified with no need of unsupervised target knowledge. Depending 
upon availability of a priori target knowledge two versions of unsupervised MPC, 
referred to as desired target detection and classification algorithm (DTDCA) and 
automatic target detection and classification algorithm (ATDCA), are presented in this 
chapter. The DTDCA is applied to a situation that there is knowledge about specific 
targets to be classified, whereas ATCDA can be used to classify targets of interest present 
in an unknown image scene without a priori target knowledge. As a consequence, they 
result in different applications. 



13.1 INTRODUCTION 

Automatic mixed pixel classification (AMPC) is part of automatic target 
recognition. One type of AMPC is considered in this chapter, which is unsupervised 
mixed pixel classification (unsupervised MPC). The unsupervised MPC arises from a 
need, where in many practical applications obtaining complete target knowledge is 
difficult, if not impossible. As shown in Chapter 8, when the complete prior target 
knowledge is available, the OSP classifier was the optimal linear Bayes classifier in the 
least-squares sense. However, when there is a lack of prior target knowledge, the OSP 
classifier can only produce sub-optimal solutions. The unsupervised MPC can improve 
the performance by taking advantage of additional a posteriori target information that is 
directly obtained from the image data by an unsupervised means. Such unsupervised 
target information helps the supervised MPC reduce interfering effects caused by 
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unknown signal sources in the image data. So, in this chapter, the unsupervised MPC is 
considered. Of particular interest is an extension of OSP to unsupervised OSP. Two 
unsupervised OSP classifiers can be derived in accordance with different applications. 
The first unsupervised OSP classifier is referred to as the desired target detection and 
classification algorithm (DTDCA), which only requires the prior knowledge of specific 
targets of interest. The problems of this type occur in applications such as 
reconnaissance, rescue and search where specific targets are known, but image scene are 
generally unknown. The second unsupervised OSP classifier is called the automatic target 
detection and classification algorithm (ATDCA), which is developed to search and 
classify unknown targets in an unknown environment. It is applied to the case that no 
prior target knowledge is available. Since there are no specific targets of interest, all the 
targets detected in the image data are considered to be interesting and must be classified. 
Interestingly, both DTDCA and ATDCA make use of the same unsupervised target 
generation process (UTGP) described in Section 5.3 to obtain the required unsupervised 
target information, despite that the initial targets used in their algorithms are different. 
When the prior knowledge of specific targets is provided, these targets are used as initial 
targets to initialize DTDCA, which then classifies these particular targets. When no such 
prior target knowledge is available, the initial targets must be obtained directly from the 
image data to initialize ATDCA. There is a significant difference between DTDCA and 
ATDCA. DTDCA considers the targets generated by UTGP as undesired targets and 
eliminate these UTGP-generated targets to improve its performance. On the contrary, 
ATDCA does not have any prior target knowledge. As a result, the targets found by 
UTGP are considered to be interesting targets, all of which must be classified. Therefore, 
ATDCA can be used for anomaly detection, while DTDCA cannot. 



13.2 UNSUPERVISED MPC 

The mixed pixel classifiers discussed in Chapters 8-12 required either partial or full 
prior target knowledge. They can be viewed as partially or completely supervised 
classifiers. A completely supervised optimal classifier is a Bayes classifier that requires 
the complete knowledge of the image data. In reality, this is impossible in many 
applications, particularly for hyperspectral image analysis. On the other hand, a 
completely unsupervised classifier such as ISODATA, which is widely used in pattern 
classification (Duda and Hart, 1973), generally performs poorly as was shown in Chapter 
9 or in Chang and Ren (2000). This is because it is a pattern classification technique, 
which is not designed for target classification. A good example is CEM in Chapter 4, 
which applies partial knowledge to target detection, while suppressing energies resulting 
from unknown signal sources. It is further improved by TCIMF, which includes 
unsupervised target knowledge that provides information of unknown signal sources 
generated by an unsupervised algorithm. As a result, TCIMF eliminates these targets 
instead of minimizing the energies of these targets as is done in CEM. A similar 
improvement can be expected for unsupervised MPC. In the following sections, we 
consider two unsupervised MPC methods. 



13.3 DESIRED TARGET DETECTION AND CLASSIFICATION 

In many applications, there always exists some partial a priori knowledge, which we 
should be able to take advantage of For example, in reconnaissance applications, we may 
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be only interested in some specific targets present in an unknown image scene, which 
may include unknown signal sources that have interfering effects on the desired targets. 
In order to reduce such interference, we can find these potential interferers and remove 
them before detection takes place. Since there is no knowledge about such interferers, it 
must be obtained directly from the image scene by an unsupervised method. The 
algorithm, referred to as desired target detection and classification algorithm (DTDCA) is 
developed for this purpose. It uses the unsupervised algorithm, UTGP described in 
Section 5.2 to generate a collection of interfering signal sources that will be eliminated 
by the follow-up OSP-based classifier. 

Desired Target Detection and Classification Algorithm (DTDCA) 

1) Select to be the target of interest. 

2) Use as the initial target in UTGP in Section 5.3 to generate a set of targets 
denoted by . 

3) Use <5osp(r) specified by (3.9) to classify t^. 

It should be noted that when we classify target , any target other than will be 
considered to as an interferer or undesired target to target , no matter what this target 
is, a man-made object or natural background signature. In this case, the generated target 
set will be the same U used in (3.2). Next, we apply the OSP projector 
to all image pixel vectors r for classification. The resulting image will show only target 
with all the target signatures in being nulled out. In general, if there is a set of 
desired targets of interest, D, t^in step 1) can be replaced with those targets in D. 

In the following example, we will show how detection performance can be improved 
by incoiporating unsupervised target information. 

Example 13.1 (LCVF AVIRIS Image) 

In this example, the LCVF AVIRIS scene in Fig. 1.6(a) was used for experiments 
where playa, cinders, vegetation and rhyolite were targets of interest and each of these 
four target signatures was used as the desired target signature for DTDCA. Since the 
only assumed target knowledge was and no other target information was available, the 
DTDCA must generate additional target information in such a way that this information 
could be used to improve the detection and classification performance. Fig. 13.1 shows 
the results of DTDCA in detection and classification of playa, cinders, vegetation and 
rhyolite. In each of four cases 12 unknown targets were generated by UTGP in step 2) of 
DTDCA. These UTGP-generated targets were considered to be undesired targets and 
joined with to form the target signature matrix M for the OSP classifier. 

The images labeled by (a) in Fig. 13.1 were produced by the OSP classifier with 
M = uj with U, = t, where was the first unknown target generated by UTGP and 
the Tj underneath the images was the OPCI defined in (5.3). Then the images labeled by 
(b) in Fig. 13.1 were produced by the OSP classifier with M = [t^U 2 ] with 
U, = [tj t,] where t, was the second unknown target generated by UTGP and the rj 
underneath the images was the OPCI defined in (5.4). The performance of DTDCA was 
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improved by eliminating one more undesired target . The UTGP was continued until 
the 12-th target was generated where M = U,^] with Uj, = [t^ As we can 

from Fig. 13.1, the improvement of DTDCA was not visible after first few undesired 
targets were eliminated. This is largely due to the fact that the image has low spatial 
resolution (20 m) and target signatures may be mixed with background signatures. As 
will be shown in Example 13.2, this will be changed when a HYDICE image is used. 
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In the above experiment, DTDCA was used to detect and classify the four desired 
targets, which were assumed to be resident in the image scene. However, in some 
applications, the desired target may not be present in the image to be processed. Fig. 
13.2 shows how DTDCA works for such cases. 
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The image used in Fig. 13.2 was shown in Fig. 13.2(a), which was extracted from 
Fig. 3.1(a) and did not have the playa in the scene. In this case, DTDCA used the playa 
as the desired target, and made an attempt to detect and classify the playa in the scene. 
As we can see from the images, DTDCA may show something going on in the 
beginning (see Fig. 13.(b-e)). But as more unknown targets were eliminated, the OSP- 
classified images became darker and showed no indication of the desired target present in 
the image scene. At this point, we can conclude that no desired target was found and 
detected in the image scene. It should be noted that DTDCA can be very useful in 
reconnaissance applications when one searches for a known target in an unknown 
environment. 

Example 13.2 (15-Panel HYDICE scene) 

In Example 13.1, the AVIRIS experiments did not show clear advantages of 
DTDCA. In this example, we will use the 15-panel HYDICE image in Fig. 1.7(a) to 
show the effectiveness of DTDCA, which was not demonstrated in Example 13.1. In 
analogy with Fig. 13.1, Fig. 13.3 shows the results of DTDCA in detection and 
classification of 15 panels where 24 unknown targets were generated by UTGP. The 
desired target signatures used in the DTDCA were {Pi}^_, shown in Fig. 1.8. Since the 
panels in the same row are made from the same material, they were all classified into one 
class as shown in Fig. 13.3. Unlike the AVIRIS images in Fig. 13. 1, the HYDICE 
images in Fig. 13.3 clearly showed the effectiveness of DTDCA. According to spectral 
analysis in Chapter 2, the spectral signatures of the panels in rows 2 and 3 are very 
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similar, so are the spectral signatures of the panels in rows 4 and 5. It is generally 
difficult to classify them into different classes. DTDCA seemed to have the same 
difficulty in the beginning of the process where the panels in both rows were shown in 
the same image (see detection of panels in row 3 and detection of panels in row 5). 
However, as more unknown targets were generated and eliminated, the panels in the 
desired row were enhanced while the panels in the other row beginning to vanish. 
Eventually, only the panels in the desired row were detected and correctly classified in a 
single image. 
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In order to further show the utility of DTDCA in searching for a known target not 
present in an image scene, we selected a HYDICE scene shown in Fig. 13.4(a) in which 
no PI signature was present. Fig. 13.4 (b-1) demonstrates a step-by-step process of 
DTDCA where the images became completely dark after the first four unknown targets 
were generated and eliminated. This implies that no desired target was detected in the 
image scene. 
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To conclude this section, one comment is noteworthy. As mentioned in Section 5.3, 
there are two ways to terminate UTGP. One is to set a threshold for OPCI. Another is to 
predetermine the number of unknown targets to generate. Which one is better is 
determined by applications. In examples considered in this section, we chose the latter 
which preset 12 targets for Example 13.1 and 24 targets for Example 13.2. 
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13.4 AUTOMATIC TARGET DETECTION AND CLASSIFICATION 

In DTDCA, we assumed that there were specific targets of interest. DTDCA used 
these targets as desired targets and nulled out all other unknown targets that are generated 
by UTGP prior to target detection. In this section, we assume that such target knowledge 
is not available. In this case, no specific targets are of particular interest. The algorithm 
designed for this purpose is referred to as automatic target detection and classification 
algorithm (ATDCA). It is very similar to DTDCA but differs in two aspects. First of all, 
there is no desired target to serve as an initial target that can be used to initialize the 
algorithm. It must be generated directly from the image scene. Second, since we do not 
have any target knowledge, we need to classify all the targets that are generated by the 
UTGP. As a result, each target will be classified in a separate image individually. 

Automatic Target Detection and Classification Algorithm (ATDCA) 

1) Select = arg{max^[r^r]}. 

2) Use as the initial target in step 1 of UTGP 

3) Follow steps 2-8 outlined in UTGP to generate . 

4) Apply specified by (3.9) to classify and all targets in individually. 

More specifically, assume T^ ={tJuU^ in model (3.1). Apply the OSP classifier 
= t]P^. to classify all individual targets t^, G T^ with 

= [Iq tj •••L j •••tj. Since there are ^ + 1 targets in T^, ^ + 1 images 
will be generated by UTGP, each of which shows only one particular target in T^ 
classified by the OSP classifier. 

In order to compare to DTDCA, we conduct the same experiments used for 
Examples 13.1 and 13.2 and demonstrate the difference between DTDCA and ATDCA. 

Example 13.3 (LCVF AVIRIS scene) 

With an assumption that no prior target knowledge about the image is available, the 
initial target must be obtained directly from the scene by the UTGP, which was a 
playa signature. Fig. 13.5(a) shows the classification result of t^. This is followed by 
the classification result of the first target, tj generated by the UTGP, which was a cinder 
pixel. Then the second target, t, found by the UTGP was a vegetation pixel and its 
classification result is shown in Fig. 13.5(c). The process was continued until a stopping 
criterion was met. In this case, the same criterion (i.e. 12 targets) that was used to 
terminate DTDCA was also set to terminate ATDCA. As we examine the images in Fig. 
13.5(a-l), playa, rhyolite, shade and anomaly were classified in Figs. 13.5(a,h,j), 13.5(i), 
13.5(d,l) and Fig. 13.5 (e,f) respectively. Interestingly, ATDCA detected a single two- 
pixel wide anomaly, which cannot be detected by DTDCA since the anomaly was not 
visible in the image scene and can not be identified by visual inspection. 
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It is worth noting that since the playa covers a large area and its spectral signatures 
varies. As a result, the playa was detected in three different separate images in Fig. 
13.5(a,h,j). Similarly, it also found that shade and anomaly were classified in two 
separate images, Fig. 13.5(d,l) and Fig. 13.5 (e,f) respectively. The same phenomenon 
was also observed in Chang and Heinz (2000). 

Due to a lack of target knowledge, ATDCA cannot compete against DTDCA, which 
uses the desired target knowledge. It should be also noted that the image sequences 
shown in Figs. 13.1-13.4 were the progressive improvement of DTDCA in detection of 
the desired targets, whereas the image sequence in Fig. 13.5 showed the classification 
results of specific targets found by the UTGP in sequence. 

Example 13.4 (15-Panel HYDICE scene) 

In this example, we once again used the 15-panel HYDICE scene for experiments to 
better illustrate the difference between DTDCA and ATDCA. Fig. 13.6(a-l) shows the 
classification results of 12 targets found by the UTGP in sequence. The initial target 
was found at the left upper comer and is a strong interferer, which was also detected in 
Fig. 6.5(a,e). Then the first target was found to be a background signature, which was 
a grass pixel. It was then followed by the second found target which was the first 
panel pixel in row 5. As mentioned previously, the spectral signatures of panels in row 4 
and 5 are very close. This resulted in classification in Fig. 13.6(c), which classified all 
panels in these two rows in one image. In this case, we could not discriminate panels in 
row 5 from those in row 4. A similar result was also obtained in Fig. 13.6(e) where the 
fourth UTGP-generated target was the first panel pixel in row 3. The classification 
could not make distinction between the panels in row 2 and the panels in row 3. Since 
the panels in row 1 have very distinct spectral signatures from those in other rows, they 
were classified in Fig. 13.6(f) in a separate image where the first panel pixel in row 1 was 
detected as the fifth target, Compared to the results obtained in Fig. 13.3 by 
DTDCA, DTDCA outperformed ATDCA in classification of the 15 panels. In particular, 
all panels in different rows were successfully separated and correctly classified by the 
DTDCA. This is because that the target signatures used by DTDCA were the average 
signatures obtained in Fig. 1.8, whereas the target knowledge used in ATDCA was 
single-pixel target information which may not well represent the spectral signatures of 
panels in the same row. Nevertheless, this problem can be mitigated by replacing a single 
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target pixel with the average of a group of similar target pixels found by a similar 
hyperspectral measure described in Chapter 2. 




13.5 CONCLUSIONS 



This chapter presents two unsupervised MFC methods, DTDCA and ATDCA. 
Despite that they both look alike, their applications are completely different. DTDCA can 
be used for cases that some partial prior knowledge is available, such as reconnaissance 
applications. On the other hand, the ATDCA works for completely unknown data where 
no a priori information is given. In this case, a posteriori information is obtained 
directly from the data for target detection and classification. Applications of this type 
include surveillance or monitoring of unknown targets. 
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AUTOMATIC MIXED PIXEL CLASSIFICATIO (AMPC): 

ANOMALY CLASSIFICATION 



In Chapter 13, one type of AMPC, the unsupervised MPC, 'was considered. It made 
use of an unsupervised algorithm to generate necessary unsupervised target information 
required for unsupervised MPC. In this chapter, another type of AMPC, automatic target 
detection and classification (ATDC) is investigated, which does not require any prior 
target knowledge. It extends anomaly detection to anomaly classification. A natural 
approach is to combine an anomaly detector with a classifier to classify the detected 
anomalies. Unfortunately, it is not that simple. Two issues are needed to address before 
doing so. Since detected anomalies do not necessarily belong to the same target class, a 
mechanism is required to discern among these targets. Besides, it also needs the 
knowledge of target classes for classification. In order to cope with these two problems, 
an automatic thresholding method and four target discrimination measures are introduced 
in this chapter. The proposed automatic thresholding method is developed to segment 
anomalies from image background before target discrimination takes place. It can be 
implemented automatically. The target discrimination measures are designed based on 
two criteria, the Mahalanobis distance and matched-filter based distance, which can be 
used to cluster the detected anomalies into different target classes in an unsupervised 
manner. The mean of each clustered target class is calculated to generate the required 
target information for that particular target class to be used for classification. Coupled 
with the automatic thresholding and a target discrimination measure, anomaly detection 
can be extended to anomaly classification. 



14.1 INTRODUCTION 

Anomaly detection requires no information at all to detect targets. Accordingly, it 
cannot discriminate the targets that it detects one another. To make anomaly detection 
anomaly classification, the detection must be implemented in conjunction with an 
effective criterion to classify detected targets. For clarity, we make a subtle difference 
between "target classification" and "target discrimination". Target discrimination allows 
one to discern among a set of targets of interest, denoted by T one from another. The 
targets in T could belong to a same class but may have different spectral features. Or they 
could be targets of distinct types that belong to different classes. In both cases, target 
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discrimination does not necessarily classify the targets in T. On the other hand, target 
classification clusters or groups targets of interest, T in accordance with a certain set of 
rules, such as minimum distance in Section 8.3, nearest neighboring rule in Section 5.2. 
Such a clustering process can be carried out by either a supervised or an unsupervised 
means. The information generated by each cluster is then used for classification. As a 
result, a classifier may uncover targets in image data, which are not in T, but may 
possess similar spatial or spectral properties to that of some targets in T. More 
specifically, target classification takes advantage of the knowledge of T to produce a 
posteriori information which can be used for classification, whereas target discrimination 
is performed only within a specific set of targets, T. For anomaly classification, the set 
of targets of interest, T is produced by anomaly detection. The targets in the T are then 
discriminated by a target discrimination measure to form a set of distinct target classes. 
The mean of each distinct target class is subsequently calculated and used as the target 
knowledge for classification. Using these generated class means a classifier can classify 
targets in the entire image data regardless of whether or not these targets are in T. 
Therefore, generally speaking, anomaly classification can be accomplished by three 
stages, anomaly detection in the first stage, followed by target discrimination in the 
second stage and finally, concluded by classification in the third stage. Among these 
three sequential stage processes anomaly detection and classification have been already 
discussed in Chapter 6 and Chapters 8-10 respectively. The target discrimination is the 
only stage process yet to be developed, which is the focus of this chapter. 

In general, any hyperspectral measure presented in Chapter 2, such as spectral angle 
mapper (SAM), spectral information divergence (SID) can be used for target 
discrimination. Unfortunately, these measures are developed on a single pixel basis and 
do not take advantage of spectral correlation among pixel vectors as does anomaly 
detection proposed in Section 6. In order to take into account such sample spectral 
correlation, the Mahalanobis distance specified by (9.8) and Bhattacharyya distance 
specified by (9.9) can be included to derive second-order statistics hyperspectral 
measures. Several of such measures discussed in Chang (1998b) shed light on this 
approach. Following a similar idea, we derived four target discrimination measures from 
two criteria, Mahalanobis distance and matched filter-based distance. Interestingly, all of 
these four target discriminations are also very closely to CEM in Chapter 4 and the 
anomaly detector, the RXD in Chapter 6. The experimental results also show that these 
four measures perform very closely and nearly the same. 



14.2 TARGET DISCRIMINATION MEASURES 

As recalled in Chapter 9, the Mahalanobis distance defined by (9.9) is derived from a 
Gaussian kernel which measures the distance between a target signature x and the 
Gaussian mean m^. with inclusion of the covariance matrix to take care of sample 
correlation. The Bhattacharyya distance defined by (9.10) classifies two target signatures, 
m. and m^. using their respective class sample covariance matrices, £. and IL. to 

account for second-order statistics. If we assume that Z. = Z^. = Z, the Bhattacharyya 
distance is reduced to 

^m. - -m.y 



(14.1) 
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As noted, (14.1) has the identical structure to that of RXD, (x - - jJi) with 

X = m., JLL = m^. and = S, specified by (6.1). If we replace m. and with two 
target pixels t. and t and = S in (14.1), the resulting distance 

(14.2) 

referred to as covariance-Mahalanobis distance (CMD), can be used to measure the 
discrepancy between two target pixels t. and t both of which share the same sample 

covariance matrix, that is formed by the image data. According to (14.2), the 

smaller the value is, the harder the discrimination between the two targets is. However, 
we should bear in our mind that the Mahalanobis distance is effective when the number 
of samples that form the sample covariance matrix is sufficiently large. It should be 
noted that for simplicity we have used target pixels and target signatures interchangeably. 

The CMD specified by (14.2) is not the only distance measure that can be used for 
target discrimination. An alternative distance can be also derived by the concept of 
matched filter as follows 

(14.3) 

which is referred to as covariance-matched filter based distance (CMFD). It is worth 
noting that a major difference between the Mahalanobis distance specified by (9.9) and 
CMFD is that the former uses the same variable x as opposed to t . and t^. used in the 
latter. 

Additionally, if the in (14.2)-(14.3) is replaced by the inverse of sample 

correlation matrix, two more alternative measures can be modified from (14.2- 

14.3) by 



( 14 . 4 ) 

referred to as correlation-Mahalanobis distance (RMD) and 

(14.5) 

referred to as correlation-matched filter based distance (RMFD). 

Here, (14.4) is an alternative form of the Mahalanobis distance. But, we should pay 
a particular attention to (14.5). It is very close to the form of CEM specified by (4.8) 
where t. is designated as the desired target signature d and t. can be viewed as the 

image pixel vector r to match the desired target t. . So, the larger the matched value is, 
the more likely the two targets belong to the same class. It basically performs a similar 
task as does a matched filter. This is the reason why the target discrimination measures 
specified by (14.3) and (14.5) are referred to as matched-filter distance measures. 
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14.3 ANOMALY CLASSIFICATION 

The purpose of a target discrimination measure is to differentiate the targets detected 
by an anomaly detector and separate them into different classes. These distinct target 
classes will be used to generate target information required for a classifier. In other 
words, these target classes can be used as training sample classes to generate a posteriori 
information needed for classification. Basically, an anomaly classifier is made up of three 
operators in sequence, an anomaly detector, followed by a target discriminator and 
concluded by a classifier. The algorithm to implement an anomaly classifier can be 
summarized as follows. 

Anomaly Classification Algorithm 

1) Apply an anomaly detector to detect potential anomalous targets, (t . } . 

2) Use one of the target discrimination measures specified by (14.2)-(14.5) to group the 
targets detected in step 1) into separate target classes, ^co. Y • 

3) Find the mean of each target class, {t . 

L J J y = i 

4) Apply a classifier to classify targets in image data using | as the desired target 

information. The classifier used in this step can be any classifier, such as the OSP 
classifier specified by (3.9) or LCMV classifier specified by (1 1.2). In this step, the 
classification is done based on the target information provided in step (3). As a 

result, the targets that match will be detected and also classified. These 

classified targets may also include targets, which were missed by the anomaly 
detector in step 1). 



14.4 AUTOMATIC THRESHOLDING METHOD 

In view of the fact that the images generated by RXD are generally gray scale, the 
detection is usually carried out by visual inspection. However, in order to avoid such 
human interpretation and to make an objective assessment, we need to develop a 
computer-automated thresholding method that converts a gray scale image to a binary 
image where the detected targets can be extracted from the image background. 

Recalling (14.2), RXD operates a form, which allows one to detect anomalies in a 
large background by finding high peaks of gray levels in homogeneous regions. 
Therefore, the larger the gray values of the pixels are, the more likely the pixels are 
anomalous pixels. This suggests that the gray level values of anomalies should behave as 
outliers and fall in the right tail of the image distribution. For a given value a, we define 
a rejection region, denoted by R{a) = {r|^/j;)^£)(*‘) < a}, by a set made up of all the 

image pixels in the RXD-detected image whose gray level values less than the a. We use 
the histogram of the RXD-detected image to define the rejection probability, P(a), as 



P{a)^Vi{R{a)). 



(14.6) 
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Then a threshold used for anomaly detection can be determined by setting a 
confidence coefficient y such that P(a^) = y . If ** detected as an 

anomaly. For example, Fig. 14.1(a) plots the gray level values of Fig. 6.5(a) which was 
produced by RXD for the 15 panel HYDICE scene and Fig. 14.1(b) shows its 
corresponding histogram. If the confidence coefficient was set by 7 = 0.99, the 

corresponding threshold was found to be 484.75 shown in Fig. 14.1(c). 
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Figure 14.1. (a) Plot of the gray values produced by the RXD generated image; (b) histogram of (a); (c) An 
enlargement of right tail in (b). 



Using this to threshold the RXD-detected gray scale image in Fig. 6.5(a) 
resulted in a binary-thresholded image shown in Fig. 14.2 where there were 39 target 
pixels, {uj. ^ segmented by = 484.75. 




Figure 14.2. A binary-thresholded image of Fig. 14.1(b) resulting from the automatic thresholding method 
with the confidence coefficient y = 0.99. 

It should be noted that the confidence coefficient of y = 0.99 was selected 
empirically and can be adjusted. When the confidence coefficient y is close to 1 , only a 
few targets will be detected as anomalies. If the confidence coefficient y is set too low 
(y « 1 ), many interferers and background signatures will be extracted as anomalies. 
Fortunately, with the proposed thresholding technique the sensitivity of selecting an 
appropriate threshold value can be reduced because it is based on the detected images 
rather than the original image. 

According to Fig. 14.2, 39 target pixels, (t j were detected and were labeled by 

L J J j-i 

the order that they were detected from top to bottom and left to right. To further classify 
these 39 detected targets into separate target classes, the four target discrimination 
measures, CMD, CMFD, RMD and RMFD specified by (14.2)-(14.5) can be used for 
this purpose. Since they all produced the same results, only results of RMFD is 
presented here and shown in Fig. 14.3(a-k), where t. was designated as a seed pixel 
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while t. running through all the 39 target pixels. For example, in Fig. 14 . 3 (a), was 

used as a seed pixel for t. and was chosen to be any target t from } As shown 

in the plot, the peak values of produced by the RMFM were They 

were all clustered together. So, 13 and t^, 12,14 were considered to be in the same class. 
It was clearly shown in Fig. 14 . 3 (a-j) that the 39 detected targets were further clustered 
into 10 target classes, denoted by {1^,12,13,14}, {I5 A6,li7 42i}» {^7 A12}? 

{ig^lii}, {1 i3’1i4>1]5,1i6}, {1i8’^19’^20 A22 A23 A24 }? {^25 ’^26 ’ ^27 ’ ^28 ’ ^29 ’ ^30 } ’ {^31 } ’ 

{132,133,134,135,136,137,138} and {139} where 15 panels are classified in images in Fig. 
14 . 3 (a,e,f,g,h,i). 
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Figure 14.3. Plot generated by RMFD, . 
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After target discrimination, we need to find what these targets were. According to the 
ground truth map provided by Fig. 1 . 7 (b), each of four panels of size 3 m x 3 m, p,j, 
p3j, p^j , p^j contains two R pixels except that the panel pjj of size 3 m x 3 m has only 
one R pixel. There is only one R pixel in each of panels of size 2 m x 2 m and Im x Im. 
Based on this information the 39 detected target pixels can be identified into 10 classes 
as follows. 

( 1 ) Panels in row 1 : I2 = R pixel of Pjj, 1^,13 = Y pixels of p^^, I4 = R pixel of Pj2 

(2) {1546,117^21} "" anomalies located in the forest 

( 3 ) {t7,t^,tjQ,tj2} “ anomalies located in the forest 

( 4 ) {l8,lii} = anomalies located in the forest 

( 5 ) Panels in row 2 : Ii3,li6 - R pixels of p2j, I14 = Y pixel of P21, I15 = R pixel of 

P22 
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(6) Panels in row 3 : = R pixels of P3p = Y pixels of P3J, = R pixel 

of Pjj, tjo =Y pixel of P3, 

( 7 ) Panels in row 4 : t^g,t2g = R pixels of ^25 ^^6 ^ pixels of p^j, = R pixel 

of P42, t27 = Y pixel of P42 

(8) tgj = anomaly in the forest 

( 9 ) Panels in row 5 : 133,135, = R pixels of p^j, 137, ^33, 13^,133 =Y pixels of p^j, 13^ = 

R pixel of P52 , ^34"^^ pixel of P52 

(10) 135 = anomaly in grass pixel 

Fig. 14.4 further shows the correlation among the 39 anomalous target pixels 
produced by RMFD where strong correlation occurs along the diagonal. 




F^gur^^ 14,4. Correlation amoog 39 detected Siiomely produced by RMFD 

From Fig. 14 . 4 , it shows that the target classes, {tj, 13, 13, 14} and {ti3 4i4, 
clearly formed two individual and separate blocks. Since each of {tgi} and {139} only 
correlates itself and does not correlate with other target pixels, there are two blocks 
formed by a single green pixel along the diagonal. In particular, separated the target 

class {t25,yg,t,,,t28.t2,,t3„} from the target class {133433434,135436,13,435} into two 
distinct blocks. Similarly, also broke up the middle block of the target class 
{t^g, 1^5,12^,122,133,134} into two blocks where Ij^ and 1^, form four bright green pixels 
around the middle block. The block highlighted by orange and red colors is formed by 
{^9 ’^10} and shows strong correlation between these two anomalous pixels. As a matter 
of fact, these two pixels are strong interferers and always extracted by automatic and 
unsupervised detection. There are also three anomaly classes resulting from the forest, 

{tj, *6,^17, ‘21}, {t8,‘n} and {t,45,t,„4„}. For example, {1545, which was 
made up of three small blocks formed by and { l2i } • 

Fig. 14.5 shows the target discrimination image resulting from RMFD where 10 
target classes are highlighted by different colors. 
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14,5. \0-dix&s dbcrmimation image of ihe 39 deteeied piaeb 

Using the discrimination resnlts provided by Fig\. 143 the mean of each target class 
was calculated to represent the Uirget information for this particniar class. These 10 pieces 
of target information were available for CEM detection. The detection results are given in 
Fig, 14.6 where the pixels of size hn >; Im in rcovs 1, 2, 3 and 4 were extracted in Fig, 
14.6(aje,f>g), which w^ere missed by RXD. 




Figure? 14,6. ot CHM ths; larger mi^nnaefn denvvc freni die dj%v;rnTr>manvn 



Finally, the LCMV c!a,ssificr was used to classy all the 10 target classes in a single 
image shown in Fig, 14.7. 




14 bv LC'MV ussrtg the Urge! irifurmztlo^ donvcd from ih'c diwdrmaa.uoa resuds 

It should be also noted that RXD failed to detect all the five Im x Im panels p.3, 
1 < z < 5 due to small amounts of abundance present in these five panel pixels and their 
size smaller than the 1.5 spatial resolution. However, these missing panels can be 
extracted if we used one of the four target discrimination measures to generate the 
necessary target information for each target class. As shown in Fig. 14.6(a,e,f,g) and Fig. 
14.7, it was indeed the case. The panels p^3, P23J P33 and p^j which were not picked up 
by RXD were actually detected 14.6(a,e,f,g) by the CEM detector and classified by 
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LCMV classifier. This advantage results from the use of target information generated by 
a discrimination measure. 

As a final comment, in order for the four target discrimination measures specified by 
(14.2)-(14.5) to work effectively, the number of samples that form the sample covariance 
matrix must be sufficiently large to avoid singularity problems resulting from ill-rank of 
the sample covariance matrix. In this case, the number of data samples must be greater 
than or equal to the number of spectral bands. Since the size of the remotely sensed 
imagery is generally larger than the total number of spectral bands used for acquisition, 
this requirement is usually satisfied. For the real-time implementation of CRXD, which 
uses the sample correlation matrix, the real-time processing does not take place until the 
number of samples is greater than the number spectral bands. In other words, there is no 
real-time processing in the first few lines before we collect enough data samples to form 
the non-singular sample correlation matrix. 



14.5 ANALYSIS ON TARGET CORRELATION USING TARGET 
DISCRIMINATION MEASURES 

As shown in Fig. 14.4, the color map produced by target correlation could be very 
useful in target classification since it described the degree of one target correlated with 
another. Using such a color map to display correlation for visualization was used in Lee 
and Landgrebe (1993). In this section, a further analysis is conducted for four target 
discrimination measures on target correlation using the 19 red pixels provided by the 
ground truth map in Fig. 1.7(b). Fig. 14.8(a-d) shows the color visualization of the 
correlation among 19 R pixels (i.e., target center pixels) generated by CMD, RMD, 
CMFD and RMFD, respectively, where vertical bars next to figures are provided to show 
the degree of correlation using a range of colors ranging from dark red to dark blue. A red 
color indicates high correlation while a blue color shows less correlation. 




- t.) R’l, (t^ - t J (e) (t - t - u) (d) 

Correlation^ amosg 19 R produced by CMFD, RMFD. CMD md 



According to the ground truth, these 19 red pixels are painted by five different 
materials. This fact is reflected in Fig. 14.8(a-d) where five small square blocks along the 
diagonal are clearly shown in Fig. 14.8(a-d). Fig. 14.8(a-b) shows the correlation maps 
produced by the Mahalanobis distance-based distance measures (i.e. CMD and RMD) 
where a low value implies a high correlation as shown by blue color. By contrast. Fig. 
14.7(c-d) shows the correlation maps using the matched filter-based distance measures 
(i.e. CMFD and RMFD) where a higher value represents a higher correlation as shown by 
colors other than blue. Furthermore, because the five Im x Im panels p. 3 , l<r<5 
which were not detected by RXD contained very small amount of abundance, their 
correlation was weak. This was evidenced in the Fig. 14.7(c-d) where there was one pixel 
apart between two consecutive blocks. 




505 774 




Table 14.2. Values of RMFD between the 19 R panel pixels 
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Table 14.4. Values of RMD between the 19 R panel pixels 

Panel 1 Panel 2 Panel 3 Panel 4 Panel 5 
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If we compare Fig. 14.7(c-d) against Fig. 14.7(a-b), it is obvious that the former 
characterizes correlation better than does the latter except for the panels, p. 3 , 1 < / < 5 . In 

other words, the matched filter-based distance measures performed a little bit better than 
the Mahalanobis distance-based distance measures in terms of capturing correlation 
among most panel pixels. 

In order to further analyze quantification performance, we calculate the correlation 
values for these 19 panel pixels using all the four target discrimination measures and 
tabulate their results in Tables 14.1-14.4, As shown in Tables 14.1-14.4, the five small 
blocks showing strong correlation in Fig. 14.7(a-d) are also shown in the tables. Like 
Fig. 14.7(a-d), the values in Tables 14.1-14.2 also characterize panel correlation better 
than those in Tables 14.3-14.4. 



14.6 ON-LINE IMPLEMENTATION 

In order to demonstrate an on-line processing for anomaly classification, we will 
implement CRXD-LCMV hybrid classifier using RMFD specified by (14.5) as the target 
discrimination measure along with the automatic thresholding described in Section 14.4. 
Since there was no appreciable difference between RMD specified by (14.3) and RMFD 
as shown in Chiang and Chang (2001) and Chiang (2001), we chose RMFD due to the 
fact that RMFD can be easily implemented by a QR-decomposition. The whole 
processing consists of three stages, the CRXD implementation in step 1), a target- 
clustering process using RMFD and a real time processing of LCMV classification 
developed in Chang et al. (2001). Like RXD which can be implemented in real time, the 
CRXD-LCMV anomaly detection and classification can be also implemented on-line 
processing with negligible time lag. Since it requires two pass processes, a time delay is 
inevitable. Nevertheless it can be minimized. A similar phenomenon was also discussed 
in Section 10.5 where near real-time process was used to make distinction from the on- 
line process used here. 

The first process is carried out in the same manner as does CRXD to detect 
anomalies in real time. The second pass of discrimination and classification process takes 
place only a few lines delay (or a set of pixels delay if it is implemented on a pixel-by- 
pixel basis) right after CRXD was executed. The algorithm takes advantage of time lag of 
a few lines to generate sufficient target information for the follow-up classification. Since 
the classifier used in anomaly classification is the LCMV classifier that can be also 
implemented in real time as described in Chang et al. (2001), the proposed two-pass 
anomaly detection and classification can be essentially implemented in a timely manner 
with a few lines delay between two passes. Such real time processing with a few lines 
time delay is referred to as on-line process. 

It should be noted that instead of the sample covariance matrix the sample 
correlation matrix was used in CRXD and the target discrimination measure (i.e., either 
RMD or RMFD) to achieve real-time capability. As demonstrated in Chapter 1 1 , a real- 
time process can be implemented by either a line-by-line or pixel-by-pixel fashion. 
However, a difficulty arises in the process of target discrimination, which must be done 
right after detection and prior to classification. In addition, the target information 
generated by discrimination also needs to be updated either every line or every pixel 
depending upon which real-time process is used. In order to alleviate this problem, we 
build a look-up table and update it while the detection process in process. The look-up 
table contains target templates produced by an anomaly detector and a target 
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discrimination measure. Each template represents one distinct type of targets and will be 
used as the desired target information for classification. Initially, there is no target 
template in the look-up table. As soon as the anomaly detector detects its first anomaly, 
it will be included as the first target template in the table. When a second anomaly is 
detected, the target discrimination measure then compares the new detected anomalous 
target to the first target template. If the detected target is close and similar to the target 
template, the first target template will be updated and replaced by a new target template 
which is obtained by averaging the new detected target and the first target template, and 
this new target will be then discarded. Otherwise, the second detected anomaly will be 
included in the table as the second target template. This look-up table keeps track of new 
target templates and is updated by comparing the next detected anomaly against the 
existing target templates. If there is a match between the newly detected anomaly and a 
target template in the table, the matched target template will be updated by the average of 
the matched target template and the new target, and the new target will be eliminated. If 
there is no match, the new target will be used to create a new target template by its own 
and will be included in the table. This particular on-line anomaly classifier is referred to 
as the RXD-LCMV hybrid classifier. Although there may be a small time lag of 
implementing the target discrimination between the target detection and classification, 
the three operations can be actually processed in parallel. Even in the case that the 
detection, discrimination and classification are implemented sequentially by computer, 
the whole process can be nearly simultaneously if the size of an image scene to be 
process is not very large. For the 15-panel HYDICE scene with size of 64 x 64 x 169 in 
Fig. 1.7(a), there was no appreciable time delay in real-time implementation. Fig. 14.9 
shows an on-line target discrimination result produced by the CRXD using the RMFD as 
the target discrimination measure where the line number underneath each image indicates 
the result was obtained by using the causal information up to that particular line. As we 
can see, the results in Fig. 14.9 are completely different from that in Fig. 6.5(a). Other 
than the 2-pixel anomaly detected in Fig. 6.5(a) many additional anomalies were picked 
up and discriminated by RMFM. This is because the process of the target discrimination 
occurred immediately right after the detection process. For example, cinders, shade and 
vegetation were detected in Fig. 14.9(a-b), then followed by playa detected in after line 
80 in Fig. 14.9(c-g). The rhyolite was extracted after line 120 in Fig. 14.9(e-g). 




Figure 14.9. On-line target discrimination results of the AVIRIS scene resulting from the CRXD using RMFD 
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When the process was completed, the CRXD-LCMV classifier actually detected and 
classified all the five major substances in the image scene, cinders, playa, rhyolite, shade 
and vegetation shown Fig. 14.10. Unfortunately, it missed the anomaly because it was 
averaged out by target discrimination. 




Figure S4;i0v OvAm.& iaigei results of tfse AVIRIS sceue resultmg from the CRXD usi?\g 

m eoojurction with liio IX'MV chissUlor 

Like the CRXIUXlhAN dassiliei. CRXD also detected many anomalies as w^ell in. 
Fig, 14 J l (a"h), llnforiunately, all these delected anomalies suddenly vanished as soon as 
a rwo-pixe! anomaly detected at the upper edge of the lake in Fig, 14,1 l(i). The same 
image almost remamed. imcbanged until the whole process completed. 




Figure 14,11, Go Hue target detection resuits oi ihc AVIRIS scene resnHJug fsvm tHo CRXD wjr.hota Grget 
dkcmmmi\on 



A problem arises in RXD is that RXD could not discriminate among the anomalies 
it detected and all the detected anomalies are shown in images according to their gray 
scales. As a consequence, a weak anomaly of one type may be dominated by a strong 
anomaly of another type. This explains why there was only the 2-pixel anomaly detected 
by the RXD in Fig. 6.5(a) because its abundance was so strong and eventually dominated 
other anomalies which may belong to different types of targets, but had relatively low 
abundance fractions. It did not imply that they were not detected by RXD. It is simply 
too difficult to detect their presence by visual inspection since their gray level values 
were scaled down by the strong 2-pixel anomaly. However, if a target discrimination is 
applied right after RXD, targets of different types can be separated and would not be 
dominated by targets of another types. In this case, different classes of targets can be 
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detected as shown in Fig. 14.9. This experiment provides an interesting example. That 
is, when target discrimination is implemented in conjunction with CRXD, the resulting 
CRXD-LCMV becomes a classifier rather than an anomaly detector. 

A similar experiment was also conducted using the 15-panel HYDICE image scene. Fig. 
14.12 (a-g) shows the results by the line-by-line on-line process of the CRXD-LCMV 
classifier. 




14J.2. dii-CfiminatKsn ofthe HYDICH Kct'ne from iho C'RXD 

RMFD 

As shown in Fig. 14.12(a), an interferer and the first panel pixel in row 1 were 
detected and discriminated by yellow and green colors respectively. Then two more 
intereferers were picked up in Fig. 14.12(b) highlighted by red. This was followed by 
Fig. 14.12(c-g) where panels in rows 2-5 were extracted respectively and also 
discriminated by different colors. Fig. 14.12(h) shows the results of the complete on-line 
target discrimination process. Fig. 14.13 (a-h) shows the results of the on-line CRXD- 
LCMV classification right after the on-line target discrimination process. 




Figure 14.13. On-line target classification results of the 15-panel HYDICE scene resulting from the CRXD 
using RMFM in conjunction with the LCMV classifier 

It used the target information generated from the on-line target discrimination 
process to classify different types of targets. If we compare Fig. 14.10 to Fig. 14.9 and 
Fig. 14.13 to Fig. 14.12, the images in Figs. 14.9 and 14.12 were more clean than those 



274 



HYPERSPECTRAL IMAGING 



in Figs. 14.10 and 14.13. This is because the target discrimination was performed on 
only the pixels detected by RXD while the target classification was carried out for all 
image pixel vectors. Fig. 14.14 also shows the results produced by CRXD without target 
discrimination. 




Figure 14 14, Cm deiscijorj of 'he HYDICF scone fVv.tm CILXD willujut target 

As we can see from Fig. 14.14 CRXD detected some panels in rows 1-3 in Fig. 
14.14(a-f). However, as soon as the panels in row 4 were detected in Fig. 14.14(g), the 
previous detected target pixels suddenly became dim as shown in Fig. 14.14 (h-1). This 
phenomenon remained until the end of the process. As we compare the final result of 
CRXD in Fig. 14.14(1) to that generated by the CRXD-LCMV in Fig. 14.13(g), CRXD- 
LCMV using target discrimination performed significantly better than CRX^D with no 
target discrimination included in target detection and classification. In particular, the 
former extracted a few panel pixels in row 2 and one panel pixel in row 1 which were 
missed by the latter in the first column of Fig. 14.10(g). So, this experiment further 
shows that the RXD-LCMV classifier using target discrimination actually performed and 
operated as a classifier. 



14.7 CONCLUSIONS 

This chapter extends the anomaly detection in Chapter 6 to anomaly classification. 
Such extension is challenging because an anomaly detector does not necessarily 
discriminate among the targets it detected. It requires a discrimination measure to 
differentiate the detected targets one another. To meet this need, four target 
discrimination measures are presented which can be categorized into two classes, the 
Mahalanobis distance-based measures and matched filter-based target discrimination 
measures. The former measures the distance between two target pixel vectors using the 
data covariance/correlation matrix to take care of sample spectral correlation. The latter 
measures the distance between two target pixel vectors based on the degree to which both 
pixel vectors match each other. Interestingly, experimental results show that these four 
target discrimination measures perform very similarly. In order to further implement 
anomaly classification in real time as the way we did for anomaly detection in Chapter 6, 
a hybrid of RX^D and LCMV, called RXD-LCMV anomaly classifier, is also developed 
for real-time implementation. It combines CRXD with an LCMV classifier by 
incorporating a target discrimination measure prior to target classification. The 




AUTOMATIC MIXED PIXEL CLASSIFICATION (AMPC): ANOMALY CLASSIFICATION 275 



experiments demonstrate that the RXD-LCMV anomaly classifier performs very 
differently from the anomaly detector, RXD that does not use target discrimination. This 
shows that in order for an anomaly detector to classify its detected anomalous targets, 
target discrimination is a necessary and critical step to success in anomaly classification. 
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AUTOMATIC MIXED PIXEL CLASSIFICATION 
(AMPC): LINEAR SPECTRAL RANDOM 
MIXTURE ANALYSIS (LSRMA) 



Independent component analysis (ICA) has shown much success in blind source 
separation and channel equalization. Its applications to remotely sensed images are 
investigated in recent years. Linear spectral mixture analysis (LSMA) has been widely 
used for subpixel detection and mixed pixel classification in remote sensing image 
processing and already studied in Chapters 3, 8-10, 13. It models the spectral signature 
of an image pixel vector as a linear mixture of spectral signatures of targets present in the 
image data and the target abundance fractions are assumed to be unknown, but 
nonrandom constants. This chapter combines these two approaches to one, called ICA- 
based linear spectral random mixture analysis (LSRMA), which describes an image pixel 
vector as a random process resulting from a random composition of multiple spectral 
signatures of distinct targets in the image data. It differs from LSMA in that the 
abundance fractions of the target signatures in LSRMA are considered to be unknown 
random independent signal sources. Two major advantages can be benefited from 
LSRMA. First, LSRMA does not require any prior knowledge of the targets used in the 
linear mixture model. Second and most importantly, LSRMA models each of target 
signatures as an independent random signal source so that the spectral variability of target 
signatures can be captured more effectively in a stochastic manner. The only required 
knowledge for LSRMA is the number of signal sources, p assumed to be present in the 
image data. This number is estimated by Neyman-Pearson detection theory-based eigen- 
thresholding methods to be investigated in Chapter 17. 



15.1 INTRODUCTION 

The linear spectral mixture analysis (LSMA) assumes that the spectral signature of 
an image pixel vector is a linear mixture of the spectral signatures of target signatures 
resident in image data. Two restrictions generally limit the utility of LSMA. One is that 
the complete target knowledge must be given a priori. In many practical applications, 
obtaining such a priori information is usually very difficult. To relax this requirement, 
several unsupervised methods discussed in Chapter 5 can be used to generate necessary 
target information directly from the image data. The second restriction is that the 
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abundance fractions of targets must be constrained to achieve better estimation. When the 
target abundance fractions are considered as unknown nonrandom constants, they can be 
estimated by statistical methods such as least-squares estimation. However, due to many 
unknown factors such as atmospheric effects, nonstationary image background, etc., the 
spectral signatures of targets may vary pixel by pixel. Therefore, it is more realistic to 
assume that the abundance fractions of targets in a pixel vector are random quantities 
rather than unknown constants so that the spectral variability of a target can be described 
and captured by a random process. In this case, it is more appropriate to assume that an 
image pixel vector is a random linear mixture where the abundance fraction of each target 
signature is considered as a random signal source. 

Independent component analysis (ICA) has been studied extensively in blind source 
separation and other applications (Hyvarinen and Oja, 2000; Comon, 1994; Karhunen et 
al., 1997; Oja et al., 1995; Bell and Sejnowski, 1995; Lee, 1998; Girolami, 2000). The 
idea of ICA offers a feasible approach to solving the random abundance mixture problem 
described above. Applications of the ICA to remotely sensed image analysis have been 
recently investigated (Bayliss et al., 1997; Zhang and Chen, 2002; Szu, 1999; Tu, 2000; 
Chiang et al., 2000). It differs from the principal components analysis (PC A) in many 
aspects. The PC A decorrelates the sample data covariance matrix in the sense that the 
data set is decomposed into a set of uncorrelated and orthogonal components where each 
component is oriented by an eigenvector. With such eigen-decomposition PC A can 
achieve data compaction, dimensionality reduction by preserving most significant 
information in principal components in terms of data variance. So, PCA is generally 
used for information preservation, but not for target detection and image classification. In 
contrast to PCA, ICA looks for components, which are statistically independent rather 
than uncorrelated\ thus, it requires statistics of orders higher than variance and 
covariance. In addition, the ICA-generated components are not necessarily geometrically 
orthogonal. The ICA utilizes a linear model to describe a mixture of a set of unknown 
random signal sources, then unmixes these signal sources in separate components so as 
to achieve signal detection and classification. If we further assume that the abundance 
fraction of each target signature in LSMA is an unknown and independent random signal 
source, the source mixing model considered in ICA can be readily applied to LSMA. In 
this case, ICA can be used to solve for random abundance fractions of target signatures. 
Such an ICA-based LSMA can be viewed as a random version of LSMA and will be 
referred to as linear spectral random mixture analysis (LSRMA). 

In order for LSRMA to be effective, several assumptions must be made. One is that 
the source components must mutually statistically independent. This implies that the 
target signatures present in the image data must be spectrally distinct. A second 
assumption is that at most one source component is allowed to be Gaussian. This is 
because a sum of Gaussian processes is also Gaussian and ICA cannot separate Gaussian 
processes using a linear mixture model. In remotely sensed imagery, the number of target 
pixel vectors, such as small man-made targets, anomalies or rare minerals is relatively 
small compared to the entire image. Due to a large number of background pixels, we may 
assume that image background pixel vectors are Gaussian- distributed and small target 
pixel vectors are non-Gaussian signal sources. If we further assume that the noise is 
additive white Gaussian, then the background pixel vectors and the white Gaussian noise 
pixel vectors will be classified in a single class that is assigned to a same Gaussian 
independent component since ICA cannot separate two Gaussian sources. Since the 
targets of interest are considered as non-Gaussian signal sources in LSRMA, they will be 
separated by ICA in different components. As will be shown in experiments, it is indeed 
the case. 
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The ICA proposed in LSRMA is slightly different from the commonly used ICA in 
two ways. First, the separating matrix (also referred to as unmixing matrix) W derived 
from ICA is usually assumed to be a square matrix of full rank and an orthogonal matrix. 
In this case, the number of signal sources, say p, must be equal to data dimensionality, 
L. As demonstrated in our experiments, this assumption may not be valid for 
hyperspectral images where the number of signal sources is generally much smaller than 
the number of bands, i.e. p < L. This results in that the matrix W is not of full rank and 
the learning rule derived from inverting W may become unstable and may not converge. 
The problem of this type is called under-complete ICA and has been investigated in 
Amari (1999). Second, the mixing matrix M is often assumed to be orthogonal. In 
remote sensing image classification, the matrix M made up of the spectral signatures of 
targets of interest, which are distinct but not necessarily orthogonal. With the 
orthogonality assumption for W the learning algorithm designed for ICA is forced to 
look for an orthogonal decomposition of signal sources. The resulting independent 
components may be able to unmix targets with signatures spectrally orthogonal to each 
other, but may not effectively unmix targets with similar spectral signatures. So, instead 
of imposing the constraint that the covariance matrix of the separating matrix W must be 
an identity matrix as is done in the standard ICA, our proposed ICA approach is derived 
by using the constraint that the covariance matrix of unmixed abundance fractions of 
target signatures must be an identity matrix. This advantage enables us to design a 
learning algorithm to converge to independent components that can separate spectrally 
similar targets. A similar approach was also investigated in Cardoso (1996). The 
experiments show that such learning algorithm is very useful and suitable to 
hyperspectral image analysis because many targets extracted from a hyperspectral image 
do have similar spectral signatures, which may not be orthogonal. 

Like LSMA, LSRMA also requires knowing the number of independent sources, p, 
in the image data. In order to estimate the p, three Neyman-Pearson detection theory- 
based eigen-thresholding methods recently developed for estimation of the intrinsic 
dimensionality can be used to estimate p (Chang and Du, 1999; Harsanyi et al., 1994) 
and also discussed in Chapter 17. It converts the determination of the number of signal 
sources to a binary composite hypothesis testing problem where the number of times a 
Neyman-Pearson test fails is the number of unknown signal sources present in the image 
data. It turns out that this approach provides a good estimate of the p. 



15.2 INDEPENDENT COMPONENT ANALYSIS (ICA) 

Following the same notations used in Section 3.2 and (3.1) a linear spectral mixture 
model is described by 

r = Ma + n (15.1) 

where r be an L x 1 column pixel vector, n is noise, M is an Lx p target signature 
matrix, denoted by j^nij m, ••• with being an L x p column vector represented 
by the signature of the y-th target and p is the total number of targets assumed to be in 
the image. Suppose that a = (^a^, ■■■, is a pxl abundance column vector 

associated with r where a. denotes the abundance fraction of m . and r will be used to 
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represent either the pixel vector r or its spectral signature. A spectral linear unmixing 
method estimates the unknown abundance fractions, a, , a. , • ■ • , a via an inverse of a 

linear mixture model described by (15.1). One requirement of LSMA is that the signature 
matrix M must be known a priori. Many approaches have been proposed in the past to 
obtain M directly from the image data in an unsupervised means. Here, we investigate a 
rather different and interesting approach, independent component analysis (ICA) which is 
also based on model (15.1), but does not require the prior knowledge of M. Unlike 
previous techniques discussed so far in this book, it assumes that the p abundance 
fractions, , a, , ■ • • , are unknown random signal sources rather than unknown 
nonrandom parameters. In this case, three additional assumptions must be made on the 
abundance vector a = (^a^, a^, - \ . 

(i) The p target signatures, mj, m^, • ••, in M must be spectrally distinct. 

(ii) The/? abundance fractions, are mutually statistically independent 

random sources. 

(iii) Each of the p abundance fractions, a^, , • • • , cc^ must be a zero-mean random source 

and at most one source is Gaussian. 

In remotely sensed imagery, the first assumption simply says that there are p 
spectrally distinct targets in a scene and each column vector in M represents one distinct 
target in the scene. The second assumption implies that the compositions of distinct 
targets in a pixel r are random quantities, one independent of another. The third 
assumption suggests that if targets of interest are not Gaussian sources, ICA can detect 
and classify them from the image background in separate independent components. This 
is based on the fact that due to a very large number of background pixels in the image the 
natural background pixels can be considered as a Gaussian-like source while targets can 
be viewed as non-Gaussian signal sources. Since ICA cannot separate Gaussian processes 
using a linear mixture model, ICA classifies the background area and additive Gaussian 
noise into one single independent component, whereas different targets are classified into 
other separate independent components. Accordingly, automatic target detection and 
classification can be accomplished by LSRMA. It should be noted that except for the 
three above assumptions, no prior information is assumed about model (15.1). 



15.3 ICA-BASED LSRMA 

In order to implement ICA using model (15.1), the mixing matrix and the unknown 
signal sources used in the blind source separation are replaced with the target signature 
matrix M and the p random abundance fractions denoted by a^, respectively. 

With this interpretation ICA finds a pxL separating matrix W and applies it to an 
image pixel r to unmix the p abundance fractions, • More precisely, ICA 

solves an inverse problem of model (15.1) for the px L separating matrix W via the 
following equation 

d(r) = Wr, (15.2) 
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where d(r) = is the estimate of abundance fractional vector 

derived from r. Since changing order of components in d(r) does not 
affect their statistical independence, the estimate of the z-th abundance fraction a. may 



appear as any component cc.{y) of d(r). Further, because multiplying random variables 
by non-zero scalar factors does not affect the statistical independence, it is also 
impossible to determine the true amounts of the abundance fractions, Cfj, a^, 

from model (15.1) without additional assumptions. Unless we are interested in target 
quantification the order and the true abundance fractions are generally not crucial in 
detection and classification of targets. In this case, we can normalize each abundance 
fraction source to unit variance so that the covariance matrix of the abundance fraction 
sources becomes the identity matrix. This can be simply done by a sphering (whitening) 
process studied in Section 12.3 of Chapter 12. 



15.3.1 Relative Entropy-Based Measure for ICA 

In order to use ICA, a criterion is required to measure the statistical independence 
among the estimated abundance fractions a^fr), ^p(®') ■ According to 

information theory (Cover and Thomas, 1991), relative entropy or Kullback-Leibler 
information distance function is an appropriate measure. Let 
p{a(r)) = p{a^(r), ‘ , a ^{r)) be the joint probability density function (pdf) of the 

estimated random abundance vector d(r) obtained from (15.2) and p(oc.) be the 
marginal pdf of the j-th abundance fraction a. for 1< j < p given by (15.2). Since 
a^, a^, in model (15.1) are assumed to be independent, p{a) = Yl%iP((^j)- If we 

assume that a is a random source vector and d(r) is the estimate of a from the 
observation vector r, then the entropy of a relative to d(r) is defined by 



D{p{a{r))\\p(a) = O;., p(«p) = Ep(a(r))log(p(a(r)) / p(«)) 

= Sp(a(r))log(p(a(r))/Ilj,, p(ap) (15.3) 
= -//(«(r)) - Ep(a(r))[x;.. log p(ap] 

where H(d(r)) is the entropy of the estimated abundance vector d(r) as defined in 
(2.35) in Chapter 2. Minimizing (15.3) over d(r) through W in (15.2) is equivalent to 
minimizing the discrepancy between pdfs p(a) and p{d(r )) . That is, the smaller the 

D(p(d(r))|| p(a)) is, the less the discrepancy between p(a) and p(d(r)) is; thus, the 
more likely to be independent the djfr), d^(r), •••, d^fr) are. Because the pdf of a is 
generally unknown and needs to be estimated, the p{a) = p{a.) in (15.3) is 

generally replaced by its estimate p(d{r)) ~ p{d.{r)) . Substituting this estimate 

into (15.3) results in 
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i)(p(a(r)) ||p(d(r)) = n^., p(d-(r))) = log(«j(*'))] “ H(a(r)) 

= S;.,//(«,(r))-//(d(r)) 

where E[,] is the expectation with respect to p(d(r)) and Hid.{^)) is the entropy of the 
y-th estimated source d.(r). Unfortunately, even in this case, finding the pdf of d(r) is 
also difficult in practice. In order to mitigate this dilemma, Comon introduced an 
alternative criterion (Common, 1994) that approximates the D^/7(d(r))|j /?(d^.(r))j 

given in (15.4). Instead of minimizing D^p(d(r))|| p(d^.(r))j, Comon suggested 

to maximize the higher order statistics of the data, called contrast function given by 

m(W) = S" AkI. + k:... + IK... - 6kIk .. (15.5) 

where K^. is the third order standardized cumulant of the y-th d^.(r) for 1 <y <p 
representing “ skewness” of p{d.{v)) and K.... is the fourth standardized cumulant of the 
y-th d^.(r) for l< j < p representing “ kurtosis” of p(d^(r)). If the skewness of 
p{d.{Y)) is sufficiently large, (15.5) can be further approximated by 
other hand, If the kurtosis of p(d.(r)) is sufficiently large, a good approximation of 
(15.5) is . In either case, (15.5) is reduced to a much simpler criterion. 

15.3.2 Learning Algorithm to Find Separating Matrix W 

Since the second-order statistics can be taken care of by decorrelation, the data 
vectors r are first pre- whitened prior to separation. In this case, data are completely 
characterized by statistics with orders higher than 2. So, for simplicity we assume that 
the data vectors have been pre-whitened in this section. In order to derive a learning 
algorithm, we impose a constraint that the covariance matrix of the estimated abundance 
vector d(r) in (15.2) must be an identity matrix. To further simplify notations, we 
denote d(r) by y with y. = d,(r). The learning algorithm to be developed must solve 
the following constrained optimization problem. 

maximize y/(W) = over W for m > 3 subject to = I (15.6) 

where I is the px p identity matrix. For such constrained problems, we use exterior 
penalty methods discussed in Cichocki and Unbehauen (1993) to eliminate some or all of 
the constraints. The idea is to add to the objective function specified by (15.6) so-called 
penalty function terms, which assign a higher cost to infeasible points. In our case, the 
penalty function terms imposed on (15.6) are defined by 
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0; if\(y) = 0 
>0; ifh.^(y)*0 



(15.7) 



where constraints are hij(y) defined by h^jiy) = ®[yi3'y]~ Now using the penalty 
function terms specified by (15.6) and (15.7) we can define a penalty function P(y) by 

Piy) = l^j.^,jU(k^(y)) = (1 / 2)S^=,A,(£[yj;] - (15.8) 



where A. > 0 is called a penalty parameter or penalty multiplier (typically all 1.. = A). 

Thus, maximizing the constrained problem described by (15.5) is equivalent to 
maximizing the following function, 

7(W)= VA(W)-P(y)= j/r(W)-(A /2)S-;=i(p[y,yy]-^J (15.9) 

which subtracts the penalty function given by (15.8) from the objective function in 
(15.5). In order to find the separating matrix W = we calculate gradient of the 

function J(W) by differentiating \|/(W) and P(y) with respect to respectively: 



dif/(W) 

dw 



—(z;., £[>;]') = 2m£[y; ]£[>;■'/-,] 



(15.10) 



dPjy) A 

2 • (15.11) 

= 2AZ,'l,(£[y,yJ-5,)£[y,y,] 

(15.10) and (15.11) can be expressed in terms of matrix forms as follows. 

V„Vr(W) = 2mA£[g(y)r"] (15.12) 

V„P(y) = 2A(£[yy"]-I)yr^ (15.13) 

where g(y) = and A = diag{£[y,'” ]} is the diagonal matrix with 

the z-th diagonal element given by From Eqs. (15.12) and (15.13), a learning 

algorithm to generate the desired separating matrix W can be designed by 

= w, + M£[5(y)r'] - v{E[yy"]- i)yrZ 



(15.14) 
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where ju andrj are learning parameters in (15.12) and (15.13) respectively. It was found 
empirically that p controls the convergence speed and is generally less than 1 while t] 
controls the constraint and is generally greater than jU. 



15.4 EXPERIMENTS 

Since there is no prior knowledge about the signal sources assumed in ICA, the 
number of targets, p, resident in an image scene must be estimated from the image data. 
For multispectral imagery, L is not large and the p can be estimated by exhausting all the 
components. However, it may also have a drawback. In some occasions, p> L which 
results in an over-complete ICA (Lee et al., 1999). In hyperspectral imagery, the number 
of bands is usually greater than hundreds. In this case, p is much smaller than L, p « L 
and selecting an appropriate p for LSRMA becomes crucial. In the past, many criteria 
such as an information theoretic criterion (AIC) (Akaike, 1974), minimum description 
length (MDL) (Schwarz, 1977; Rissanen, 1978; Rissanen, 1983), network information 
criterion (NIC) (Murata, et al., 1994), transformed Gershgorin radii (TGR) (Wu et al., 
1995) have been proposed for estimating the number of signals in sensor array 
processing. However, these criteria require the prior knowledge of likelihood functions, 
generally of Gaussian functional forms. In addition, in all these criteria, the noise 
assumed to be independent identically distributed, an assumption generally not true in 
remotely sensed images. Recently, the TGR was modified by a noise adjusted 
transformed Gershgorin disk (NATGD) to estimate p (Tu, 2000) which was determined 
by a graphical plot. However, Chiang et al. (2000) showed by experiments that the p was 
underestimated by the NATGD. Since the p is closely related to the virtual 
dimensionality (VD) in Chapter 17, the p is estimated by VD + 1 due to the fact that at 
least one component is required to accommodate the noise in LSRMA. The issue of the 
VD estimation will be postponed to Chapter 17. 

In the following sections, two sets of real hyperspectral image data, AVIRIS in Fig. 
1.6(a) and 15-panel HYDICE scene in Fig. 1.7(a) were used for experiments to 
demonstrate the performance of LSRMA with skewness (m = 3 in (15.6)) and kurtosis 
(m = 4 in (15.6)) used as optimal criteria. 

15.4.1 AVIRIS Image Experiments 

It was demonstrated in Harsanyi and Chang (1994) that there were five targets of 
interest, cinders, rhyolite, playa (dry lake), vegetation and shade. So, at least five 
components, /? > 5, are needed to classify and separate these five targets. However, this 
was obtained by visual inspection supported by ground truth. If we assume that no prior 
knowledge is available about the image scene, the number of targets in an image scene, p 
must be estimated from the data. As shown in Chapter 6, there was a single anomaly of 
two-pixel size located at the top edge of the lake shown inside a circle marked in Fig. 
6.1(b). This anomaly cannot be seen or detected visually from the scene. In order for 
LSRMA to detect this anomaly along with the five targets, the p must be greater than 5 
so that the cinders, rhyolite, playa dry lake, vegetation, shade and the anomaly can be 
detected and classified in 6 separate components. If we use the three methods, the HFC, 
modified HFC and NSP methods proposed in Chapter 17, the VD was estimated to be 4, 
5 and 8 respectively. So, in this case the corresponding p = VD -f- 1 will be 5, 6 and 9 
with one extra component included to accommodate the noise. In Tu (2000) the same 
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identical image scene in Fig. 1.6(a) was considered and the p was estimated by the 
proposed NATGD to be 4. Unfortunately, with p - A the shade and the anomaly were 
not detected. In order to demonstrate the impact of different values of p on the 
performance, we conducted experiments with p = 4 estimated by NATGD, p = 5 by 
HFC, p = 6 by modified HFC and p = 9 by NSP respectively. The learning parameters 
|Li and r\ used in (15.14) for the following experiments were empirically set to p = 0.5 
and rj = 1. 

Example 15.1 (p = 4) 

Fig. 15.1(a-d) shows the results generated by LSRMA using skewness as a criterion. 
Only the cinders, vegetation and center of playa were classified in Fig. 15.1(a-c) and the 
noise component was shown in the 4-th component. It missed the detection of the 
rhyolite, shade and part of playa. 







Fig. 15,2 (a-d) shows the results produced by l.,.SRMA using kunosis where mdy 
the anorrutly, the vegetalion and the bottom part playa were cla.s>sified in Fig, 1 5.2(e-c) 
and the norsc was represented by the 4- th component. It also missed the detec.hrm of' the 
cindeis. rliyolite, shade and a Luge portion of playm Interestingly, log, 15 /tai deteeted 
the anomaly ohsm'ed m log 6 Itb) This experanent derronstialed that ihe skewness 
and the kimosis periermed verr differently, hm kntirvus is more efieerne to deuno smal 
targets. 




I lenr^ i’' X ^ \ j ,c 0. t -r S o' „ > rw-. v < < J 



Example 15.2 (p = 5) 

In this example, we conducted similar experiments to Example 15.1 with p = 5 
where Figs. 15.3 and 15.4 were results obtained for skewness and kurtosis respectively. 
Comparing Fig. 15.3 to Fig. 15.1, little has been changed in results for skewness except 
that the 5-th component showed nothing but noise. 
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However, v/hexi the kurtosis was used, there was a drastic change between Figs, 15.2 
and 15,4, Both dc^exled the xinonmly ixx tbdr fnet cornponraes show:? m Fig. 15.2(a) and 
Fig. 15.4(a), llowever, the ciiviers and shade were exlraeted in Fig. 1.5.4(C“d) respectively 
svbich wa^re rni Fig 15 2 Unfortunately, it still also missed the detection of the 

rhyolite and the play a. 




Example 15,3 ( p ™ 6} 



.Figs. 1 5.5 U. 5. 6 show the results produced by LSR..MA using .skewness and kurtosis 
for p ™ 6. Both criteria did not perform w^ell either. In Fig. 15,5, the skewness missed 
the anomaly, vegetation, rhyolite and a large portion of dry lake. C)n the othex hand, the 
knrtosis did detect the anomaly, but also missed vegetation and part of dry lake. 




cC: Urnivyxjfp.tf OOiOisc 



I unwe m,S. I'V, ,'..S . ;r, ^ SK,X1,\ 




Example 15.4 ( p = 9) 



Similar experiments were also conducted for p - 9. Figs. 15.7 and 15.8 show the 
results produced by LSRMA using skewness and kurtosis respectively where the 
skewness and kurtosis demonstrated their different strengths in target extraction. The 
cinders, the vegetation, the playa, the rhyolite and the shade were detected by skewness 
in the first 6 components shown by Fig. 15.7(a-f). Interestingly, the playa was detected 
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in two separate components in Figs 15.7(c) and 15.7(e) and the anomaly was barely 
detected along with the shade in Fig. 15.7(f). 




By contrast, kurtosis extracted the anomaly, vegetation and cinders in the first 3 
components shown by Fig. 15.8(a-c) while the playa was extracted in the next three 
separate components in Fig. 15.8(d-f). Unfortunately, the rhyolite and the shade, which 
were extracted in Figs. 15.7(c) and 15.7(f) by the skewness were not detected by the 
kurtosis. Since the playa covers a very large area of the image scene, the dynamic range 
of its spectral variability is relatively large. Therefore, the playa was classified in three 
separate components with the center detected in Figs. 15.7(c) and 15.8(d), the edge 
detected in Fig. 15.8(e), and the bottom detected in Figs. 15.7(e) and 15.8(f). 
Interestingly, there was an interferer picked up in Fig. 15.7(h), which was not shown in 
all the previous images. 




This phenomenon was also observed in Chang and Heinz (2000) and Heinz and 
Chang (2001). Such subtle spectral variations were overlooked in Harsanyi and Chang 
(1994) because the playa signature was obtained by averaging a large dry lake area by 
visual inspection. As a consequence, the entire playa was extracted, and the anomaly was 
averaged out and could not be detected. A similar problem was also found in Tu (2000) 
where the lake was detected as an entity, and both the shade and anomaly were not 
detected. This resulted from the fact that the estimated p was too small. 
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To conclude the AVIRIS experiments, various values of p were also tested and 
evaluated by skewness and kurtosis. The results for p = 4, 5, 6, 7, 8 were not as good as 
that obtained by p = 9. When the p increases from 10 to 12, the performance of both 
criteria was not improved significantly except that different portions of the lake and 
Gaussian sources were detected in the increased components. In order to see whether or 
not the performance of LSRMA can be improved with p-L, Figs. 15.9 and 15.10 show 
the result of LSRMA for /? = L = 158 using skewness and kurtosis respectively where 
playa and vegetation were detected and classified in three and two separate components in 
Fig. 15.9(b,f,g) and Fig. 15.9(c,e) by skewness respectively. 




r ? j ixj.:;; k \ i.' ;■ • U ksi« • i if y w ^ i 



As we can see from Figs. 15.9 and 15.10, increasing p did not necessarily improve 
detection and classification performance. In Fig. 15.9, skewness missed the detection of 
the anomaly and the shade compared to Fig. 15.10 where the kurtosis missed the 
detection of the shade as well as a large portion of the lake. 

The above experiments demonstrated that skewness may be a good criterion for 
classification of large areas while the kurtosis may be effective in extracting small targets 
or insignificant targets. 

In general, the number of components produced by LSRMA should be equal to p 
where each component accommodates one signal source. Additionally, a different p 
should also generate a different set of component images. However, it is interesting to 
note that according to our experiments, if the p exceeds a certain number, the difference 
(e.g. Euclidean distance) between two consecutive projection vectors generated by the 
LSRMA is very small. In this case, LSRMA will stop generating additional components 
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and the LSRMA will be called to converge to this number, denoted by Nlsrma(p) which is 
usually much smaller than the p. Table 15.1 tabulates Ni^rma(p) generated by LSRMA 
using various p. 



Table 15.1 ^^NlsrmaC^^ 







P = ^ 


p = 6 p = 9 p=\22 


p= 151 


/?= 158 


sknewness 


4 


5 


6 4 122 


151 


9 


kurtosis 


4 


5 


6 139 18 


151 


10 



For instance, ioxp- 158 we were supposed to have 159 components. However, after 
the 9‘ (skewness) or 10^^ (kurtosis) components was generated, the subsequent 
components were very similar to 9'^ (skewness) or 10^ (kurtosis) components within very 
small least squares errors. So, Nlsrma(p) for skewness and kurtosis are 9 and 10 
respectively. It should be also noted that if p is less than Nlsrma(p), a different value of p 
will generate a complete different set of component images. This suggests that finding an 
appropriate KsrmaO) for LSRMA could be very challenging. The VD proposed in Chapter 
17 may be a good estimate for Ksrma. 

In all the experiments conducted in this chapter, the difference between two 
projection vectors is measured by Euclidean distance and the error tolerance e was set to 
0. 1 . It turns out that the error threshold was rather robust where N^srma was the same for e 
ranging from 0.1 to 0.001. 

15.4.2 HYDICE Image Experiments 

Since the HYDICE scene in Fig. 1.7(a) provides a ground truth map in Fig. 1.7(b), 
we can take advantage of target center pixels to simulate synthetic images to evaluate the 
performance of LSRMA. 

Example 15.4 (Computer Simulations) 

In this experiment, we simulated a scene with size of 50 x 50 that is similar to Fig. 
1.7(a). It consists of 25 single-pixel panels simulated by PI, P2, P3, P4 and P5 in Fig. 
1.8. These 25 simulated pixels are shown in Fig. 15.1 1(a) and arranged in 5 rows with 5 
pixels in each row. 




The 5 pixels in the same row were simulated by the same signature with different 
abundance fractions, 1.0, 0.8, 0.6, 0.4, 0.2 respectively starting from the first column to 
the 5-th column. In this case, the mixing matrix M is formed by 

M = [Pl P2 P3 P4 P5] and the abundance vector a is assigned by 

a = (l.0,0.8,0.6,0.4,0.2)^ 
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In addition to these 25 simulated pixels, we also added zero-mean Gaussian noise 
with different variances to achieve different levels of signal-to-noise ratio (SNR). Fig. 
15.11(b) shows the resulting image with a zero-mean Gaussian noise added to the 
synthetic image in Fig. 15.11(a). 

In the following simulations, 6 scenarios with SNR = 30dB, 25dB, 20dB, 15dB, 
lOdB, 5dB were conducted to evaluate the performance of LSRMA with using skewness 
and kurtosis. In the skewness case, Nlsrma(5) = 5 for SNR = 30dB, 25dB, 20dB, 15dB, 
lOdB, and all 25 panel pixels were detected and accurately classified into their own 
classes in these five components. For the case of SNR = 5dB, Nlsrma(5) = 4 and the 
panel pixels in rows 2 and 3 were detected but forced to be classified into one class. This 
is because that the spectra of P2 and P3 are very similar as demonstrated previously. For 
comparison, only results for SNR = 5dB and lOdB are shown in Figs. 15.12 and 15.13 
respectively where the amounts of detected abundance fractions are also plotted for 
reference. 
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Unlike skewness, LSRMA using kurtosis produced a very large number of 
components for SNR = 30dB, 25dB, 20dB, 15dB, lOdB, 5dB, but all the 25 panel 
pixels were detected and accurately classified into their own classes in the first five 
components. Since the results for all cases were similar except different amounts of 
detected abundance fractions, only results for the case of SNR = 5 dB is shown in Fig. 
15.14. 




It is worth noting that there were also interferers and background signatures detected 
with relatively small abundance fractions in the components that are beyond the first five 
components. Furthermore, as SNR increased, so was N^skmaO?), t)ut the amount of detected 
abundance fractions were more accurate and close to the true abundance fractions. This 
also resulted in that more interferers were detected, each of which was classified into a 
separate and individual component. According to the above computer simulations, 
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kurtosis was very effective but also sensitive to target detection. In addition, the 
performance of both criteria was also proportional to the strength of SNR. 

Example 15.5 (15-panel HYDICE scene) 

In this example, we used the 15-panel HYDICE scene in Fig. 1.7(a) to conduct 
experiments for LSRMA using skewness and kurtosis. The number of mixing targets 
was estimated to be = 14, 16, 21, 125 and 86 using p = VD + 1 where the VD was 
obtained by Table 17.3 in Chapter 17. Interestingly, in these experiments, the interferer 
on the left comer and the 15 panels were all picked up and classified in the first 6 
components and their results were all very similar. The only difference is that a different 
p produced a different number of components for LSRMA. The components beyond the 
first six components generally exti'acted natural background signatures such as grass, tree 
or road and noise. Table 15.2 tabulates their corresponding values of Nlsrma(/?) generated 
by LSRMA. It is worth noting from Table 15.2 that a higher value of p does not 
necessarily imply a larger Nlsrma(^)- 



Table 15.2. NlsrmaQp) produced by different values of p 





p = \4 


p = 16 


p = 2\ 


Vs 

II 

oo 

o^ 


P-U5 


p- 169 


skewness 


9 


8 


17 


12 


6 


6 


kurtosis 


8 


6 


9 


11 


13 


12 



Experiments for/? ranging from 6 to 30 were also conducted for comparison. Figs. 
15.15-15.18 show both skewness and kurtosis detected the panels in row 2 effectively 
with /? = 14 and 16. We also conducted experiments with p = 15 for both the skewness 
and kurtosis. In this case, both of them did not detect panels in row 2. However, after p 
reached a certain number, the results were similar and did not differ very much in terms 
of detecting 15 panels. For the 15-panel scene, we have found this number is 17 for 
skewness and 1 8 for kurtosis, in which case the performance of LSRMA was very robust 
and stable. Apparently, the VD in Chapter 17 provides a good estimate of this number as 
shown in Tables 15.1 and 15.2. 




2 fj;:! d'O iii 
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As demonstrated in all the figures, LSRMA was able to detect all the 15 panels in 
the second through sixth components when p = 2\. The strong interferer located in the 
left upper comer in the forest of Fig. 1.7(a) was also detected by both skewness and 
kurtosis in their first components. By visual inspection of the scene in Fig. 1.7(a) there 
is no way to identify this interferer. The detection of this unidentified interferer in Fig. 
15. 15(a)- 15.20(a) demonstrated that LSRMA can be used as an anomaly detector to 
detect unknown targets. However, as also shown, it was unable to discriminate panels in 
row 2 from those in row 3. The same phenomenon was also witnessed in Fig. 15.12(a) 
by the computer simulation using skewness with SNR = 5dB. In order to make 
comparison, we also conducted similar experiments for/? = 169. The results for skewness 
and kurtosis are shown in Figs. 15.21 and 15.22 respectively where the interferer and all 
the 15 panels were detected and classified into 6 separate components. Like previous 
experiments, both skewness and kurtosis have the same difficulty with discriminating 
panels in row 2 from those in row 3. 
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It should be noted that Nlsrma( 169) generated by skewness was 6 and 12 by kurtosis. 
Like the AVIRIS image experiments, this also implies that when p is greater than the 
number of targets (18 in this example), a large value of p does not necessarily generate a 
large NlsrmaO). 
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15.5. 3-D ROC ANALYSIS FOR LSRMA 

Since LSRMA is unsupervised, an automatic thresholding method is required to 
segment targets from background. In this section, we use the thresholding technique 
developed in Section 14.4 for this purpose. 

Let ^lsrma(^) abundance fraction of an image pixel r resulting from 

LSRMA, which represents the gray level value of r. Due to the fact that ^lsrma(^) 
cannot detect true amount of target abundance fraction, its value is generally a real 
number and does not necessarily lie in the range [0,1]. For any given value a, we use the 
rejection probability, P(a) = Fr(R(a)) defined by (14.6) along with a prescribed 
confidence coefficient y specified by 

P(a^) = r- (15.15) 



to determine a desired threshold value ao. More precisely, assume that the confidence 
coefficient is set to y with F(aQ) = y. If ^lsrm^(f) > r will be detected as a target 
pixel and a background pixel, otherwise. It should be noted that the confidence 
coefficient can be adjusted and is determined by target size. For small targets such as 
panels in Fig. 1.7(a), a panel size of 3m x 3m with pixel resolution 1.5m would have at 
most 4 pixels. So, the ratio of a panel with 4 pixels to the entire image size with 
64 X 64 = 4096 pixels is no more than 0.001. In this case, a reasonable estimate of the 
confidence coefficient y would be approximately y = 1 - 0.001 = 0.999. As an example. 
Tables 15.3(a-c) tabulate the values of threshold ao yielded by the confidence coefficient 
y = 0.999, 0.998, 0.997 where Pi represents detection of panels in row i. 



PI 


P2 


P3 


P4 


P5 


skewness 15.3502 


17.7059 


17.9903 


19.7057 


17.6416 


kurtosis 14.7830 


17.2765 


17.4983 


19.1237 


16.6457 


Table 15.3(b). Values of threshold Oo yielded by confidence coefficient y = 


0.998 via (15.5) 


PI 


P2 


P3 


P4 


P5 


skewness 10.8525 


7.2989 


6.8064 


5.2319 


6.4973 


kurtosis 10.3464 


6.7873 


6.7814 


5.3131 


6.1286 


Table 15.3(c). Values of threshold Oo yielded by confidence coefficient y = 


0.997 via (15.5) 


PI 


P2 


P3 


P4 


P5 


skewness 4.9019 


5.7652 


4.0263 


3.6379 


4.3647 


kurtosis 4.7596 


5.2888 


4.4243 


3.6922 


4.1389 



Tables 15.4(a-b)-15.6(a-b) tally the number of panel R and Y pixels detected in Figs. 
15.19 and 15.20 with y = 0.999, 0.998, 0.997 respectively where Nr, Nrd, Ny, Nyd Nr+y, 
N(r+y)d, Nfa and Nm were defined in Section 9.3. As we can see from the above tables, 
skewness and kurtosis performed very closely and their results were nearly the same. 
When y = 0.999, both skewness and kurtosis achieved 0% false alarm rate, but they also 
missed 6 R pixels. On the other hand, when y was decreased to 0.997, both skewness 
and kurtosis detected all the 19 R pixels at the expense of falsely detecting 9 and 7 pixels 
respectively. 
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Table 15.4(a). Tal^^of£at^^^^gixel^d^ctionb^^LS^l^^ 



T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+y>D 


Nm 


Nfa 


PI 


50 


3 


47 


2 


3 


5 


45 


0 


P2 


45 


4 


41 


3 


1 


4 


41 


0 


P3 


41 


4 


37 


2 


2 


4 


37 


0 


P4 


44 


4 


40 


3 


2 


5 


39 


0 


P5 


43 


4 


39 


3 


1 


4 


39 


0 


Total 


223 


19 


204 


13 


9 


22 


201 


0 



Table 154(b) ^j^TaUy^ofganel^^gixel^detectiraib^ 



T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+y^D 


Nm 


Nfa 


PI 


50 


3 


47 


2 


2 


4 


46 


0 


P2 


45 


4 


41 


3 


1 


4 


41 


0 


P3 


41 


4 


37 


2 


2 


4 


37 


0 


P4 


44 


4 


40 


3 


2 


5 


39 


0 


P5 


43 


4 


39 


3 


2 


5 


38 


0 


Total 


223 


19 


204 


13 


9 


22 


201 


0 


Table 15.5(a). Tally of panel R 


-pixel detection by LSRMA using skewness with y 


= 0.998 


T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+y)D 


Nm 


Nfa 


PI 


50 


3 


47 


2 


6 


8 


42 


0 


P2 


45 


4 


41 


3 


3 


6 


39 


2 


P3 


41 


4 


37 


4 


4 


8 


33 


0 


P4 


44 


4 


40 


4 


5 


9 


35 


0 


P5 


43 


4 


39 


3 


6 


9 


34 


0 


Total 


223 


19 


204 


16 


24 


40 


183 


2 


Table 15.5(b). Tally of panel R-pixel detection by LSRMA using kurtosis with y = 


= 0.998 


T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+Y)D 


Nm 


Nfa 


PI 


50 


3 


47 


2 


6 


8 


42 


0 


P2 


45 


4 


41 


4 


3 


7 


38 


1 


P3 


41 


4 


37 


4 


4 


8 


33 


0 


P4 


44 


4 


40 


4 


5 


9 


35 


0 


P5 


43 


4 


39 


3 


5 


8 


35 


0 


Total 


223 


19 


204 


17 


23 


40 


183 


1 


Table 15.6(a). Tally of panel R 


-pixel detection by LSRMA using skewness with y = 


= 0.997 


T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+Y)D 


Nm 


Nfa 


PI 


50 


3 


47 


3 


10 


13 


37 


0 


P2 


45 


4 


41 


4 


6 


10 


35 


3 


P3 


41 


4 


37 


4 


5 


9 


32 


4 


P4 


44 


4 


40 


4 


7 


11 


33 


1 


P5 


43 


4 


39 


4 


8 


12 


31 


1 


Total 


223 


19 


204 


19 


36 


55 


168 


9 


Table 15.6(b). Tally of panel R 


.-pixel detection 


by LSRMA using kurtosis with y = 


0.997 


T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+y)d 


Nm 


Nfa 


PI 


50 


3 


47 


3 


9 


12 


38 


0 


P2 


45 


4 


41 


4 


6 


10 


35 


2 


P3 


41 


4 


37 


4 


5 


9 


32 


3 


P4 


44 


4 


40 


4 


8 


12 


32 


1 


P5 


43 


4 


39 


4 


7 


11 


32 


1 


Total 


223 


19 


204 


19 


35 


54 


169 


7 
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If we further used y = 0.999, 0.998 and 0.997 to threshold images in Fig. 15.19(b-f) 
and Figs. 15.20(b-f), Figs. 15.23(a-b)-15.25(a-b) show their corresponding binary images 
produced by threshold values of given in Tables 15.3(a-c). 
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As we can see from Fig. 15.23(a-b) the 5 R panel pixels (i.e., Pi3, P23, P33, P43? P53, 
one for each row in Fig. 1.7(b)) in the third column were not detected with y = 0.999. 
When y = 0.998 was used, P23, P33, P43 were extracted in Fig. 15.24(a-b) while some false 
alarm pixels were also shown up. As y was further deceased to 0.997, the 5 R panel 
pixels, pi3, P23, p33, p43, p53, Were effectively detected in Fig. 15.25(a-b) by both 
skewness and kurtosis at the expense of more false alarm pixels. It should be noted that 
these panel pixels have size of Imxlm, which is smaller than the 1.5m pixel 
resolution. As a result, they cannot be seen by visual inspection from Fig. 1.7(a). This 
example further demonstrates ability ofLSRMA in subpixel detection. As also noted in 
Fig. 15.25(a-b) and Tables 15.6(a)-15.6(b), LSRMA also falsely detected a panel pixel in 
row 3 when it intended to detect panels in row 3, and vice versa because the panel 
signatures in rows 2 and 3 are very close. Similarly, it was also true for detection panels 
in rows 4 and 5. These phenomena occurred due to the lack of prior target knowledge. 

In order to evaluate the performance of LSRMA, OSP and CEM were used for 
comparative analysis. Such selection was made based on two reasons. One is that both 
OSP and CEM have shown success in target detection and classification in Chapters 3, 
4, 8 and 9. Another is from the level of used target knowledge. OSP requires complete 
target knowledge opposed to CEM that only needs the knowledge of the desired target of 
interest. Compared to OSP and CEM, LSRMA does not require target knowledge a 
priori. As a matter of fact, if the number of signal sources, p can be estimated reliably, 
LSRMA does not need any information at all. 

Tables 15.6 and 15.7 also tally the results produced by OSP and CEM using the 
same confidence coefficients y = 0.999, 0.998, 0.997 where OSP used the five panel 
signatures, PI, P2, P3, P4, P5 in Fig. 1.8 as its complete target knowledge and CEM 
only used Pi as its desired target knowledge to detect panels in row i. 



JTable^5^7^a2||J^a%of£anel^-£i^^ 



T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+Y)D 


Nm 


Nfa 


PI 


50 


3 


47 


0 


0 


0 


50 


5 


P2 


45 


4 


41 


0 


0 


0 


45 


4 


P3 


41 


4 


37 


0 


0 


0 


41 


5 


P4 


44 


4 


40 


0 


0 


0 


44 


4 


P5 


43 


4 


39 


0 


0 


0 


43 


4 


Total 


223 


19 


204 


0 


0 


0 


223 


22 




Table 15.7(b). Tally of panel R-pixel detection 


by OSP with y 


= 0.998 




T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+Y)D 


Nm 


Nfa 


PI 


50 


3 


47 


0 


0 


0 


50 


8 


P2 


45 


4 


41 


0 


0 


0 


45 


8 


P3 


41 


4 


37 


0 


0 


0 


41 


8 


P4 


44 


4 


40 


0 


0 


0 


44 


9 


P5 


43 


4 


39 


1 


0 


1 


42 


8 


Total 


223 


19 


204 


1 


0 


1 


222 


41 




Table 15.7(c). Tally of panel R- 


■pixel detection by OSP with y = 


= 0.997 




T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(R+Y)D 


Nm 


Nfa 


PI 


50 


3 


47 


0 


0 


0 


50 


13 


P2 


45 


4 


41 


0 


0 


0 


45 


15 


P3 


41 


4 


37 


0 


0 


0 


41 


12 


P4 


44 


4 


40 


0 


0 


0 


44 


14 


P5 


43 


4 


39 


1 


0 


1 


42 


12 


Total 


223 


19 


204 


1 


0 


1 


222 


66 
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T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+y)D 


Nm 


Nfa 


PI 


50 


3 


47 


2 


3 


5 


45 


0 


P2 


45 


4 


41 


3 


2 


5 


40 


0 


P3 


41 


4 


37 


3 


2 


5 


36 


0 


P4 


44 


4 


40 


3 


2 


5 


39 


0 


P5 


43 


4 


39 


3 


1 


4 


39 


0 


Total 


223 


19 


204 


14 


10 


24 


199 


0 




Table 15.8(b). Tally of panel R-pixel detection by CEM with y 


= 0.998 




T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+y)D 


Nm 


Nfa 


PI 


50 


3 


47 


2 


7 


9 


41 


0 


P2 


45 


4 


41 


4 


4 


8 


37 


1 


P3 


41 


4 


37 


4 


4 


8 


33 


0 


P4 


44 


4 


40 


4 


5 


9 


35 


0 


P5 


43 


4 


39 


4 


4 


8 


35 


0 


Total 


223 


19 


204 


18 


24 


42 


181 


1 




Table 15.8(c). Tally of panel R- 


pixel detection by CEM with y 


- 0.997 




T 


Nr+y 


Nr 


Ny 


Nrd 


Nyd 


N(r+Y)D 


Nm 


Nfa 


PI 


50 


3 


47 


3 


9 


12 


38 


0 


P2 


45 


4 


41 


4 


7 


11 


34 


2 


P3 


41 


4 


37 


4 


7 


11 


30 


1 


P4 


44 


4 


40 


4 


7 


11 


33 


2 


P5 


43 


4 


39 


4 


9 


13 


30 


0 


Total 


223 


19 


204 


19 


39 


58 


165 


5 



Interestingly, if we compare Tables 15.4-15.6 to Tables 15.8(a)- 15.8(c), LSRMA 
performed as well as did CEM and their results were nearly the same even if LSRMA did 
not assume any target knowledge. On the other hand. Tables 15.7(a-c) show that OSP 
performed extremely poor. This is because the target knowledge used in OSP did not 
well represent the image scene. It was made up of only five panel signatures 
{P1,P2,P3,P4,P5} and did not include background signatures such as the large grass 
field, the forest on the left edge and the road on the right edge of the scene. Fig. 15.26(a- 
c) show the detection results of OSP with y = 0.999, 0.998, 0.997 where the binary 
images were obtained by thresholding the OSP-generated gray scale images using the 
values of that were determined by y via (15.15). 




r; r.oK iv -<v.s ' ir sn ;r r;'X' t -itil 

-'i x 'X'X ? Jsrx' ' k ? '>P x\!Xj' ^ 'XkPx'ux . .-r' ^ 4 
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As we can see from Tables 15.7(a)-! 5.7(c) and the above detection images, OSP 
failed to detect most of panel pixels. No R pixel was detected when y = 0.999 and only 
one R pixel, pjj in row 5 was detected by y “ 0.998, 0.997. By contrast, CEM 
performed very well. Fig. 15.27(a-c) shows its detection results with y = 0.999, 0.998, 
0.997 where the binary images were obtained by thresholding the CEM-generated gray 
scale images using the values of cCq that were determined by y via (15.15). 
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If we compare Fig. 15.27(a,b) to Figs. 15.23(a,b)-15.25(a,b), the performance of 
CEM is very close to that of LSRMA except that CEM had fewer false alarm pixels. 
These experiments further demonstrate that with no complete target knowledge which 
should also include background signatures OSP simply cannot compete with CEM and 
LSRMA. 

In analogy with Figs. 9.5, 11.8 and 12.5 we can also plot for LSRMA a 3-D ROC 
curve and 2-D ROC curves based on three parameters, R^, Rp and a% via a%MPCV 
defined in (9.3). However, if we replace a% with the confidence coefficient y as another 
parameter, we can derive a new a mixed-to-pure converter from n%MPCV defined in 
(9.3), referred to as, yMPCV which can be defined by 



Z,^(r) = Mu. = m. if d/r) > 



( 15 . 16 ) 
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where aois determined by y through = yin (15.15). 

By means of (15.16) we can also plot 3-D ROC curves and 2-D ROC curves via 
yMPCV for LSRMA, OSP and CEM based on (RH,RpA,y). In this case, two types of 3- 
D ROC curves can be generated and plotted by ^7%MPCV and yMPCV for comparative 
analysis. One is generated by a%MPCV based on (Rh,Rfa,a%) as we have seen in Figs. 
9.5 and 12.5. Another is produced by yMPCV based on (RH,RpA,y) where the confidence 
coefficient y is determined by the proposed automatic thresholding method via (15.15). 
Fig. 15.28(a-b) plot these two types of 3-D ROC curves of (Rh,Rfa,u%) and (RH,RpA,y) 
respectively, which were generated by LSRMA using skewness, LSRMA using kurtosis, 
OSP and CEM for target hit of the 15 panels in Fig. 1.7(a) where a target is hit if either 
R and Y pixel is detected. 




Ui) ROC' cwiv« fb) 34) ROC «f(Rn,RrA.r) 

Fifpiire 15.28. 3-l> RC >C curves ^e«er;5ilcd by rC*iXIPCV ;md “fMPi'V 



Fig. 15.29(a-b) plots their corresponding 2-D ROC curves of (Rh,Rfa) produced by 
Fig. 15.28(a-b) respectively. 
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(u) abundance (b) conbdcncc cncHkicnl y 

Figure I5.2R, 24) ROC curves 



As shown in these figures, LSRMA using skewness and kurtosis performed very 
closely to CEM and clearly outperformed OSP. In order to quantitatively evaluate their 
target hit performance, the areas under their corresponding ROC curves were calculated 
and tabulated in T able 15.8. 



JfaMeJLSjS^^ 





LSRMA 

(skewness) 


LSRMA 

(kurtosis) 


OSP 


CEM 


(Rh,Rfa,<3%) 


0.77798 


0.76008 


0.59133 


0.78376 


(Rh,Rfa,Y) 


0.77826 


0.76031 


0.59171 


0.78356 
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Interestingly, a%MPCV and yMPCV yielded nearly the same target hit rates in 
Table 15.8 despite that the shapes of their corresponding 2-D ROC curves of (Rh,Rfa) in 
Fig. 15.29(a-b) were different, particularly those generated by OSP. This is because the 
target hit rate is calculated based on overall performance not on a specific percentage or a 
particular level of confidence coefficient. To further see their relative difference, we also 
plot 2-D curves of (Rh, y) and (Rh,a!%) in Fig. 15.30(a-b) for comparison. As we can see 
from Fig. 15.30(a), Rh is dropped rapidly after a% is greater than 15%. By contrast, Rh 
is gradually decreased as y increases. 



t fE: 







(a ) 3-D cm\c% 



fh) 24) cm \ of 



i’1g;iijre IS.3W. 2-D curves off RKMA itsitig sUcwikss ami kurtoHrs. ( and CliM 



A similar observation can be also made on the 2-D curves of (Rfa.Y) and (RpA,«%) 
plotted in Figs. 15.31(a-b) with sudden drops of RpA during [5%,15%] for LSRMA and 
CEM. 
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The 2-D curves in Figs. 15.30-15.31 suggest that a%MPCV is sensitive to selection 
of a%. In particular, the high performance of a%MPCV resulting from small abundance 
percentages (<15%) was offset by poor performance resulting from higher abundance 
percentages (>15%). Compared to a%MPCV, the performance of yMPCV did not have 
such drastic changes as y increased. Nevertheless, their overall performance is very close 
as shown in Table 15.8. 



15.6 CONCLUSIONS 

This chapter presents an ICA-based linear spectral random mixture analysis approach, 
linear spectral random mixture analysis (LSRMA) to hyperspectral target detection and 
classification. It is different from the commonly used ICA approach in that the learning 
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algorithm is derived from the orthogonality constraint imposed on the abundance vector 
rather than the separating matrix W. In addition, the separating matrix W is not 
necessarily of full rank. Neither is the mixing matrix M orthogonal. These advantages are 
very useful in hyperspectral image classification. Since hyperspectral sensors can uncover 
targets with subtle differences, their spectral signatures are generally similar to some 
degree and may not be orthogonal. The new designed learning algorithm is able to 
converge to non-orthogonal independent components. In addition, two types of 3-D ROC 
curves generated by a%MPCV and yMPCV are also introduced for comparative study 
and analysis. According to the conducted experiments, skewness and kurtosis used in 
LSRMA have demonstrated different strengths in classification. Because skewness 
measures asymmetry of a distribution, it generally can detect changes in large areas. On 
the other hand, kurtosis measures the flatness of a distribution, thus it can detect small 
targets. That explains why kurtosis worked effectively for 1.5m HYDICE data, while 
skewness performed well for 20-m AVIRIS. 

As a concluding remark, LSRMA can be considered as a special case of projection 
pursuit (PP) which will be studied in Chapter 16 where PP designs a projection index 
(PI) to look for projections of interestingness. The two criteria, relative entropy specified 
by (15.3) and statistical moments specified by (15.6) (particularly skewness and kurtosis) 
used for LSRMA can be viewed as particular projection indices. Interestingly, developing 
a PP approach for unsupervised hyperspectral image analysis using the concept of the 
relative entropy as a PI was explored in Ifarraguerri and Chang (2000). On the other hand, 
a recent PP approach using skewness and kurtosis as a PI for unsupervised target 
detection and classification was also investigated in Chiang, Chang and Ginsberg (2001). 
These ideas are identical to that of LSRMA. As a general case, if the relative entropy 
specified by (15.3) is used as the PI, which measures statistical independence among the 
components, the LSRMA actually becomes a special case of PP. More details can be 
found in Chapter 16. 
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AUTOMATIC MIXED PIXEL CLASSIFICATION 
(AMPC): PROJECTION PURSUIT 



In Chapter 15, LSRMA made use of skewness and kurtosis to detect and classify 
potential targets automatically without prior target knowledge. These two criteria were a 
simplification of the contrast function in (15.5) derived by Comon. In this chapter, we 
present a similar approach developed by Chiang, Chang and Ginsberg (2001), called 
projection pursuit (PP), also used skewness and kurtosis as a base to design various 
project indices for automatic target detection and classification. Both LSRMA and PP 
model small targets as anomalies with an understanding that their sizes are relatively 
small in the spatial image scene compared to the image background. With this 
assumption, such small anomalies can be detected by finding outliers of the background 
distribution. The criteria, skewness and kurtosis are well suited for this purpose. Unlike 
LSRMA that requires a linear mixture model, the proposed PP projects a high 
dimensional data set into a low dimensional data space while retaining desired target 
information. It utilizes skewness and kurtosis as projection indices to explore projections 
of interestingness which are those caused by small man-made targets in a large unknown 
background. To find optimal solutions for projection indices, an evolutionary algorithm 
is developed to prevent the solutions from being trapped in local optima. Finally, target 
detection and classification is achieved by projecting the image data into separate 
projection images. In order to segment potential targets from the image background in 
these projection images, a zero-detection process is developed to threshold the projection 
images into a sequence of binary images, each of which detects a particular type of 
targets. This approach is different from LSRMA where the number of independent 
components is determined by an estimate using VD in Chapter 17 or is selected a priori. 



16.1 INTRODUCTION 

Principal components analysis (PCA) is a versatile technique which has been used 
for a wide range of applications. It is based on the concept that the interesting projections 
that orient principal components are determined by data variance. However, in some 
applications the data variance may not be the only projection we are interested in. In 
target detection, one of most interesting targets is an anomaly or a small man-made 
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target. In general, such a target cannot be extracted from a large scene by PCA because 
the variance it contributes to the data is very small. Projection pursuit (PP) provides a 
solution to this problem. The PP was first developed by Friedman and Tukey (1974) as a 
technique for exploratory analysis of multivariate data and has been studied extensively 
since then (Friedman, 1974; Huber, 1985; Jones and Sibson, 1987; Bullock et al., 
1994). Unlike most of developed target detection algorithms that require statistical 
models such as linear mixture, PP is a linear mapping that searches for interesting low- 
dimensional projections from a high-dimensional data space via a projection index (PI). 
The PI is a measure used to explore projections of interestingness. In particular, it can be 
designed to characterize nonlinear structures in projected distributions. For example, if 
the desired direction of a PI is one pointing to data variance, PP is reduced to PCA. On 
the other hand, if we would like the desired components to be statistically independent, 
ICA used to develop LSRMA in Chapter 15 becomes a special case of PP. So, PP is a 
power technique in many signal/image processing applications. Using PP for 
hyperspectral image classification has been studied previously by Jimenez and Landgrebe 
(1999) who designed a PI based on Bhattachaiyya's distance to reduce the dimensionality 
of feature space. Ifarraguerri and Chang (2000) also used the information divergence 
(relative entropy) as a PI to look for interesting projections that deviates from Gaussian 
distributions. A similar idea was further explored in Chiang, Chang and Ginsberg 
(2001), where they modeled and considered small targets as anomalies that caused 
outliers of the background distribution. 

In military applications, man-made target detection in hyperspectral imagery is of 
major interest since their size is generally small and sometimes even smaller than the 
ground sampling distance (GSD). Such target detection must rely on subpixel spectral 
detection, not pixel-based spatial detection. From this point of view, an interesting 
structure of an image scene is the one caused by man-made targets in a large unknown 
background. So, a small man-made target can be viewed as an anomaly in an image scene 
due to the fact that its size is relatively small in the spatial sense compared to the image 
background and its spectral properties are distinct from that of its surrounding pixels. As 
a result, detecting such small targets in an unknown image scene can be reduced to 
finding the outliers or deviations from the background distribution. It is known that 
skewness defined by normalized third moment of the sample distribution measures the 
asymmetry of the distribution and kurtosis defined by normalized fourth moment of the 
sample distribution measures the flatness of the distribution. They both are susceptible to 
outliers. So, using skewness and kurtosis as a base to design a PI may be an effective 
means for target detection. However, it may also occur that a small region or set of 
background pixels can be also detected as anomalies. This results from no availability of 
prior target knowledge. 

Once a PI is determined, finding optimal solutions for the desired PI is crucial. 
Unfortunately, there are generally no analytic solutions and they must be solved by 
numerical algorithms. Two major principles, "hill-climbing" and "random move" have 
widely used to design optimization algorithms. For a relatively smooth PI where the first 
derivatives exist, "hill-climbing" -based gradient descent methods are usually preferred. 
However, they may be trapped in local optimal solutions that are close to the initial 
starting points. In this case, a good guess of initial conditions is a key to success of 
these methods. On the other hand, a random move-based method such as simulated 
annealing (SA) can escape from local optima to some degree where its success is 
determined by initial starting points. In this chapter, we consider a similar random move- 
based approach to SA, called evolutionary algorithm (EA) (Golderg, 1989; Michalewicz, 
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1996; Dasgupta and Michalewicz, 1997), which is more likely to lead solutions to global 
optima. 

Prior to implementation of EA, it generally requires to preserve the translation 
invariance property of PI. This can be accomplished by centering the original data matrix 
using a linear translation followed by an eigen-decomposition method such as singular 
value decomposition (SVD) to whiten the centered data matrix. The latter whitening 
process is referred to as sphering in the PP literature. There are four stage processes 
implemented in EA, which are population selection, crossover, mutation and 
termination. Of particular interest are crossover and mutation processes. While the 
crossover process generates a descendant population by combining pairs of individuals 
randomly chosen from a parent population, the mutation process performs a local 
optimization which makes random changes in individual points to search for local 
optimum points. The proposed EA is a little bit different from the EA referred in the 
literature in the sense that it includes a zero-detection thresholding to achieve target 
detection. Since we are interested in small man-made targets, the outliers resulting from 
skewness and kurtosis are attributed to these targets. In order to extract these targets, a 
zero detection process is proposed to threshold each projection image so that targets of 
different types can be segmented by thresholding the projection images into a sequence of 
binary images. The algorithm implemented by EA in conjunction with PP and a zero- 
detection thresholding method is referred to as PPEA. 



16.2 PROJECTION PURSUIT 

The term of "Projection Pursuit (PP)" was first coined by Friedman and Tukey and 
was used as a technique for exploratory analysis of multivariate data. The idea is to 
project a high dimensional data set into a low dimensional data space while retaining the 
information of interest. It utilizes a PI to explore projections of interestingness in data. 

Following the approach suggested in Jones and Sibson (1987), we assume that there 
are A data points each with dimensionality K and X = [x, is a Kx N 

data matrix and a is a A-dimensional column vector which serves as a desired projection. 

Then a^X represents an A-dimensional row vector that is made up of orthogonal 
projections of all sample data points mapped onto the direction a where T is the matrix 
transpose. Now if we let H{-) is a function measuring the degree of the interestingness of 

the projection a^X for a fixed data matrix X, a projection index (PI) is a real- valued 
function of a, /(a): R defined by 

7(a) = A(a"X). (16.1) 

The PI can be easily extended to multiple directions, In this case, 

A = [a^ a^ •••a^] is a Kx J projection direction matrix and the corresponding projection 
index is also a real-valued function, /(A): R^""^ — > i? is given by 



I(A) = 



(16.2) 
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The choice of the H(-) in (16.1) and (16.2) is application- dependent. Its purpose is to 
reveal interesting structures within data sets such as clustering. In Chapter 15, LSRMA 
with skewness, and kurtosis, as criteria were shown effective in detecting man- 
made targets. This suggests that using and as a PI may be appropriate criteria for 
target detection. The following four projection indices will be used as criteria for target 
classification. 



4ew(a) = «,(a^X) = < (16.3) 

= = (16.4) 

r (a) = = kI + { k ] / 12) (16.5) 

where /^,^(a) is a linear mixture of and and 

= < • < = (16.6) 
where is a product of and k^. 

16.3 EVOLUTIONARY ALGORITHM (EA) 



Since the four Pi’s specified by (16.3)-(16.6) are designed to capture the third and 
fourth moments, the analysis can be largely simplified if the second-order statistics is 
taken care of by a whitening processes described in Section 12.3. Such whitening process 
is also referred to as sphering in the PP literature. So, in this section, we assume that the 
data have been whitened and the resulting whitened data matrix is still represented by X. 
As indicated in the introduction, finding optimal Pi’s is not a simple matter. Here we 
present an optimization algorithm, known as the evolutionary algorithm (EA) developed 
in Golderg (1989), Michalewicz (1996), Dasgupta and Michalewicz (1997). It can be 
implemented by four stage processes (population selection, crossover, mutation and 
termination), each of which is briefly described as follows. For details we refer to 
Golderg (1989), Michalewicz (1996), Dasgupta and Michalewicz (1997). 

1) Initial population: 

Unlike gradient methods and simulated annealing, an EA requires an initial 
population to start with to perform a multi-directional search for optimal points 
simultaneously during each iteration. A population is made up of possible projection 
vectors, each of which can be viewed as individuals. To initialize EA, an initial 
population can be created randomly or with prior knowledge if there is any. A 
simple way to form an initial population by including all basis vectors of the space 
to projected. 

2) Population Selection Process 

Once a population is formed. Then each individual in the population will be 
evaluated by a fitness function specified by the desired projection index, /(x). The 
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individual which has better fitness will have a more chance to generate new 
population in the next iteration. The procedure is described as follows. 

(a) Calculate f{\.) for each individual v.. 

(b) Calculate the probability of selecting v. by p. - /(v,.) / S;:/(v^). 

(c) Calculate the cumulative probability q. for /(v.) by q. =YJj=iPj- 

(d) Generate a random number r in the range [0, 1]. 

(e) Select the z-th individual v, such that q._^ < r < q. 

(f) Repeat steps (d) and (e) until a desired population is formed. 

It should be noted that the above selection process is stochastic and some 
individuals may be selected more than once. This is in accordance with the natural 
principle of "survival of fitness" which says that the best individuals get more 
descendants while the average stays even and the worst dies off. 

3) Crossover Process 

The crossover process used in EA is called arithmetical crossover which can be 
defined as follows. Assume there are two individuals, denoted by two vectors, v and 
w. Then the two vectors formed by v' = av + (1 - a)yf and w' = (1 - a)y + aw are 
called crossover offsprings of v and w with a G [0,1]. Let p^ be the pre-selected 
crossover probability which will be used to determine the number of individuals 
needed to produce crossover offsprings. This number is with 

p^ ‘ - 1 < J < p^ • where A^ is the size of population at the A:-th 

iteration. The crossover is performed as follows. 

(a) For each individual v in the current population generate a random number r in 
the range [0,1]. If r < p^, the v will be selected as a crossover vector. The 
selection procedure will be continued to form a crossover vector group until the 
expectation of number of selected crossover vectors is equal to A^ . 

(b) For each two individuals v and w in the selected crossover group, form their two 
crossover offsprings v' = av + (1 - a)w and w' = (1 - a)\ + aw where a is a 
random number generated from the uniform distribution in the range [0, 1]. This 
requires A^ being even. So, if A^ is odd, we need select one more individual 
randomly to make it even. 

Through the crossover process it provides us with a more effective means to find 
more competitive projections other than the ones already in the population. 

4) Mutation Process 

Like the mutation defined in a genetic algorithm, the mutation in EA is also 
performed on a bit-by-bit basis. For each individual v = each component, 

say V, is expressed in a binary expansion with I precisions by v. = 

V can be represented by a binary string, called a chromosome, formed by 
concatenating the /-precision binary expansions of all the n components, 

V = mutation process will be performed 

on the chromosome bit by bit. Let p^^ be a predetermined probability of mutation. It 
determines the number of bits needed to be mutated, denoted A^, which is the 
expectation of the number of mutated bits is equal to • A^ • « • /J. 

(a) Encode each individual in the current population into a chromosome. 
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(b) For each bit in the chromosome, generate a random number r from the uniform 
distribution in the range [0,1]. If r < p^, mutate the bit by flipping the bit 
between 0 and 1 . 

(c) Decode each chromosome into a new individual. 

The goal of the mutation is to explore the neighborhood of each individual to find 
an optimal projection. 

5) Termination Process 

There are several ways to terminate EA. (1) The algorithm reaches an optimal 
solution. (2) No significant changes are made. (3) The number of iterations exceeds a 
predetermined limit. 



16.4 THRESHOLDING OF PROJECTION IMAGES USING ZERO- 
DETECTION 

Like PC A, PPEA produces a sequence of projection images, which show the 
information in decreasing order of magnitudes of projection values produced by a PI. It 
should be noted that the magnitude referred here is the absolute projection values where 
the projection values of a PI can be positive or negative. Assume that a. is the y-th 

projection vector that maximizes a PL Using a. we can project the data matrix X into a 
projection space generated by = a^.x^^ where the data matrix X is formed by all pixel 
vectors in the image, i.e. X = [x^ For example, if we assume that the 

histogram of the projection values behaves like a Gaussian distribution, the 

outliers that creates ripples on either side of tails of the Gaussian distribution can be 
considered to be caused by small targets. So, the first values in the histogram that occur 
at zero will be selected as the desired thresholds. Then those projected pixels with gray 
scales exceed the thresholds will be extracted and considered to be target pixels. As a 
result, two images can be obtained by such zero-detection thresholding, one for target 
detection and another for the use of next projection. The one used for target detection is a 
binary image which is obtained by setting those pixels that exceed the thresholds to 1 
and zero, otherwise. This thresholded binary image, denoted by will be used to tally 
target pixels detected by this particular projection image. A second image is obtained by 
assigning 0 to all the pixels in the binary image Bj while the gray scales of the pixels 
not in the B, remaining unchanged. This resulting image is referred to as the first 
projection image X^ and will be used for next projection. The reason of producing such 
projection image is to prevent the target pixels detected in B^ from being considered 
again in the new projection image, Xj . A second projection vector a, is then found 
again by maximizing a PI based on the first projection image, X^. The projection space 

is then formed by ^ which are generated by z = a^X^ via a^. Similarly, a binary 

image B^ can be obtained from by using zero-detection in the same manner that B^ 
was generated. In analogy with finding Xj , a second projection image X, can be also 
formed by setting to zero the gray scales of pixels that are in B^ while the pixels that are 
not in B^ remaining unchanged. The same procedure is continued to generate a third 
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projection vector 33 that maximizes PI based on the second projection image, X, to 
form two projection images t and X 3 . A third thresholded binary image B 3 is 
generated by the same way that Bj and B^ were generated for subsequent target detection, 
etc. until the PPEA converges. Accordingly, a set of the desired targets can be extracted 
from a sequence of |b^. j that was obtained by applying zero-detection thresholding to 

the sequence of projection images }■ 

Using the 15-panel HYDICE scene in Fig. 1.7(a) as an example. Fig. 16.1(a) shows 
a Gaussian-like histogram of the first projection image formed by |z*} that were 

generated by t - aJ^X = via z[ = The histogram in Fig. 16.1(a) is 

plotted based on the number of projected pixels versus their projection values generated 
by z\ = a[x^ for \ <n<N. Since the right tail of the histogram does not clearly show 
ripples, its long and flat right tail is enlarged in Fig. 16.1(b). 





(b) 

Figure 16.1. (a) A Gaussian-like histogram produced by z'; (b) An enlargement of right tail in (a) 



As we can see, there are many ripples. The first value that detects zero is selected as 
the desired threshold value as shown by an arrow. It should be noted that the histogram 
in Fig. 16.1(a) is a Gaussian-like distribution, but it is asymmetric. In this case, the 
projection index using the skewness was able to detect small targets in Figs. 16.4-16.5 
that caused ripples. 



16.5 EXPERIMENTS 

Two sets of hyperspectral data are used in this section to evaluate the performance of 
PPEA in target detection. 

16.5.1 AVIRIS Data Experiments 

The 224-band AVIRIS data in Fig. 1.6(a) were used for the experiments. It has been 
shown that in general there was no need of using all full bands for image classification 
(Chang et ah, 1999d), only 12 bands were used in the following experiments to ease 
computation of PPEA. These 12 bands were uniformly selected among 224 bands. 
Additionally, it was also noted that there was a two-pixel anomaly located at the top 
edge of the dry lake marked by a white circle in Fig. 6.1(c). This single anomaly of two- 
pixel size was seen or detected visually in Harsanyi and Chang (1994) because the 
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orthogonal subspace projection (OSP) was supervised. Fig. 16.2 shows the detection 
results of the first three projection images of all the four PFs. 







™ prifduu? 

' ■V..r>Xs A I U>\-A id's 



Fig. 16.3 also shows their negative counterparts of Fig. 16.2. The reason that we 
also included negative projections in Fig. 16.3 is because the projection values are not 
necessarily positive and they can be negative. The significance of interestingness of a 
projection is determined by its magnitude not its signs. 
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As shown in Figs. 16.2-16.3, all the six signatures were detected, specifically, the 
anomaly detected in all the four Pi’s. Since kurtosis is more interesting than skewness in 
many other applications, we use kurtosis as an example for illustration. The cinders 
showed in the first projection positive image while the vegetation was detected in its 
negative counterpart. In the second projection, the vegetation and the rhyolite were 
extracted in the positive projection image while the shade and anomaly were detected in 
its negative counterpart. In third projection, the playa was pulled out fi*om the positive 
image. It turned out that both positive and negative projection images were very similar. 
We also noted that if a signature is interesting, it generally shows in more than one 
projection image such as the vegetation in Figs. 16.2-16.3. 



16.5.2 HYDICE Data Experiments 

The HYDICE scene in Fig. 1.7(a) was used for the following experiments. Like the 
AVIRIS data, only 12 bands were selected uniformly for experiments to ease 
computation of PPEA. As will be shown in the following experiments, 12 bands are 
sufficiently enough for PPEA to effectively detect target pixels. Since Fig. 1.7(b) 
provides a ground truth map, we can evaluate the performance of PPEA by actually 

tallying number of panel pixels detected in the thresholded binary images | 
generated by the PPEA. 

Following a similar analysis in Chapter 9 and criteria in Section 9.3, R and Y pixels 
were used to tally results of target detection. Fig. 16.4 shows the detection results based 
on the first six projection images using the four Pi’s where projection images were 
produced by skewness, kurtosis, mixture and product and are labeled by (a), (b), (c) and 
(d) respectively. As we can see from these images in Fig. 16.4(a-d), all the panels were 
extracted in the first three projection images. 
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Fig. 16.5(a-d) shows the thresholded binary images of the first three projection 
images in Fig. 16.4(a-d) where all the four PFs detected the panels in the first row in 
their third projection images. The panels in the second and third rows were detected in 
one projection image with the panels in the fourth and fifth rows detected in another 
projection image. The reason that the panels in the second and third rows were detected 
by the same projection image because their spectra are very similar. It is also true for the 
panels in the fourth and fifth rows. 
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As shown in Fig. 16.5(a-d), all the 15 panels were successfiilly detected and 
extracted. These detection results are impressive. Since the panels of Im x Im in the 
third column of the scene have spatial resolution smaller than the pixel resolution 1.5 m, 
they carmot be seen by visual inspection, but they are actually detected and extracted at 
subpixel level by PPEA. Table 16.1 tabulates the values of PTs of the first six projection 
images for these four Pi’s where the first three values of all the four Pi’s are significantly 
greater than those of their remaining Pi’s. This observation is consistent with the 
detection results in Figs. 16.4-16.5 and provides a clue that only three projections may 
be sufficiently enough for target detection. 



Table 16.1. 





Projection 

1 


Projection 

2 


Projection 

3 


Projection 

4 


Projection 

5 


Projection 

6 


skewness 


141.92 


198.53 


61.53 


22.19 


7.77 


6.14 


kurtosis 


43666 


53827 


18126 


839 


1423 


194 


mixture 


1142 


1138 


496 


26 


4.44 


1.735 


product 


12880834 


10769387 


20991206 


22674 


1099 


245 



In order to calculate various detection rates specified by (9.25)-(9.33), Fig. 16.6(a-d) 
shows the detection results that were obtained by combining the first three thresholded 
binary projection images in Fig. 16.5(a-d). Using Fig. 16.6(a-d) we can actually tally 
detected panel pixels. 




Interestingly, three pixels in the tree line detected in Fig. 16.6(a-b) by skewness and 
kurtosis, and four pixels detected in Fig. 16.6(c-d) by mixture and product were not 
panel pixels but interferers. This comes at a price of lacking prior knowledge. It should 
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be noted that for each PI the sets of pixels detected by all the three thresholded binary 
projection images were disjoint. Therefore, the overall detected pixels by a particular PI 
is the sum of pixels detected by its first three thresholded binary projection images. All 
the four PTs produced comparable results as shown in Tables 16.2-16.6. 



Table 16.2. Detection rates for PI using skewness 
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N, 
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35 
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P2 


45 


4 


41 


11 


4 
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39 
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Table 16.3 


'. Detection rates for PI using kurtosis 
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Table 16.4 


'. Detection rates for PI using mixture 
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Table 16.5 


Detection rates for PI using product 
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Table 16.6 


. Overall detection rates for four Pi’s 
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In Tables 16.2-16.5, no false alarm rate was calculated. Only Table 16.6 includes the 
overall panel false alarm rate, . This is because we were interested in detecting the 
panel pixels in the scene rather than particular panel pixels. From Table 16.6, we can see 
that the overall R-target pixel detection rate, are 100% except of /^urtosis^^) ~ ^4 
which produced 0.9474, but was very close to 1. On the other hand, all of the four PTs 
yielded low Y-target pixel detection rates, R^^ and the target hit rates R^, This shows 
that the proposed PPEA using four PTs are indeed very effective since a higher R^^ 
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reduces In addition, is very low and less than or equal to 0.1%. By contrast, 
the panel miss rate, R^^ is very high and close to 70%-75%. This is due to the fact that 
R^^ takes into account W panel pixels for target detection. In reality, Y-pixels generally 
are not panel center pixels, and they may be mixed by background pixels. So, a high 
R^^ does not necessarily imply low R^^. This is particularly true when the number of 
Y-pixels is relatively large compared to that of R-pixels. 

In order to further demonstrate the performance of PPEA, PCA was also studied for 
comparison. It is known that PCA can be considered to be a special case of PP with data 
variance used as a projection index. The results of the first 6 component images are 
shown in Fig. 16.7. 




Comparing to Fig. 16.4 where the four proposed Pi’s extracted all the 15 panels in 
their first three projection images, PCA picked up these panels in the 4th, 5th and 6th 
component images. This is because the panels are considered to be small targets in the 
scene and they do not generate significant information as opposed to that provided by the 
large grass field and tree line. As a consequence, the grass and tree line were extracted in 
the first three principal component images in Fig. 16.7. Interestingly, the interferers in 
the tree line were also shown in the first and third component images. This example 
shows that PCA can be used for preserving target information but not for the purpose of 
detecting small targets. Furthermore, in order to account for nonstationarity of the data, 
the sample correlation matrix instead of sample covariance matrix was used for PCA. 

To conclude our experiments, we further plotted in Fig. 16.8(a-d) the learning curves 
of the first six projections (row 1-row 6 respectively) using PPEA for four PFs specified 
by (16.3)-(16.6) respectively with the x-axis specified by the number of iterations and 
they-axis specified by the values of Pi’s. In all cases, no more 200 iterations are required 
in each optimal projection search. In particular, only approximate 150 iterations are 
needed to achieve the optimal search for the first three projections. 
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Figure 16.8. Learning curves of the first six projections using PPEA for four Pi’s 



A final comment is noteworthy. Due to the fact that the spatial resolution of 
HYDICE data is 1.5 m as opposed to 20 m for AVIRIS data, high spatial resolution 
generally uncovers small targets that may cause the outliers of the background 
distribution. As expected, PPEA performed better for HYDICE data than for AVIRIS 
data. 



16.6 CONCLUSIONS 

Projection pursuit (PP) has received considerable interest in multivariate analysis 
because it can be used to explore interesting projections by which a high dimensional 
data set can be projected into a low dimensional space for various applications. This 
chapter presents a new application of PP in AMPC for hyperspectral imagery. Since in 
military applications targets of interest are relatively small compared to their surrounding 
background, these targets can be viewed as pixels that cause outliers of the background 
distribution. To meet this need, four projection indices are suggested which are based on 
skewness and kurtosis. A revised PP -based evolutionary algorithm (PPEA) is developed 
to find the optimal projections. Furthermore, a zero- detection thresholding is included in 
PPEA to achieve target detection. In order to use PPEA for target identification, a 
database or spectral library will be required. In this case, spectral criteria introduced in 
Chapter 2 can be used to measure the spectral similarity between the detected pixels and 
spectra in database to achieve target identification. Finally, due to the nature of EA the 
computational complexity of PPEA is generally very expensive. However, this can be 
compensated for high computer powers. 
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ESTIMATION FOR VIRTUAL DIMENSIONALITY OF 

HYPERSPECYRAL IMAGERY 



Determination of intrinsic dimensionality (ID) for remotely sensed imagery is a 
challenging problem. According to the definition given in Fukunaga (1990, p. 280), the 
ID, also referred to effective dimensionality, is the minimum number of parameters 
required to account for the observed properties of the data. A general approach to ID 
estimation is principal components analysis (PCA) that makes use of the eigenvalue 
distribution to determine ID. This approach may be suitable to multispectral imagery 
since only a small number of bands are used and the resulting ID is expected to be small. 
However, the PCA method can be difficult to implement if it is applied to hyperspectral 
imagery. More importantly, it may not be effective even if it is applicable. With very 
high spectral resolution hyperspectral sensors which can extract many unknown subtle 
material substances, determining ID of hyperspectral imagery is more problematic than 
that of multispectral imagery. In order to account for such unknown signal sources, we 
introduce a new definition, referred to as virtual dimensionality (VD) in this chapter, 
which is the minimum number of spectrally distinct signal sources that characterize the 
hyperspectral data from a perspective view of target detection and classification. These 
signal sources may include unknown interfering sources, which cannot be identified by a 
priori knowledge. With this new definition, three eigen-thresolding based methods are 
presented to determine VD of hyperspectral imagery. They are all derived from the 
Neyman-Pearson detection theory. Since an information criterion (AIC) and the 
minimum description length (MDL) have been commonly used in sensor array 
processing to estimate the number of signals impinging upon the array, they will be also 
investigated and evaluated for comparison. 



17.1 INTRODUCTION 

The true dimensionality of multivariate data is difficult to determine in practice since 
its intrinsic dimensionality (ID) cannot be simply determined by the dimensionality of a 
data sample vector, which is the number of components in a data vector, referred to as 
component dimensionality. In particular, when very high dimensional data are well 
structured, the data tend to be clouded in a low dimensional space. In this case, ID is 
expected to be much smaller than the component dimensionality. Several well-known 
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methods have been proposed such as principal components analysis (PCA) (Richards, 
1993), factor analysis (Malinowski, 1977a; Malinowski, 1977b) which make use of the 
eigenvalue distribution to determine ID. Such approaches may be suitable to 
multispectral imagery where its ID is expected to be small. However, a direct application 
of these methods to hyperspectral imagery may not be feasible. More importantly, it may 
not be effective even if it is applicable (Chang and Du, 1999). This is particularly true for 
hyperspectral imagery where the component dimensionality is specified by hundreds of 
spectral channels and ID is generally much smaller. Accordingly, determination of the ID 
is crucial and also critical to success of hyperspectral image analysis, such as linear 
unmixing methods, which require the knowledge of how many image endmembers in 
image data. Unfortunately, ID determination can be very tricky and challenging. In order 
to account for such unknown signal sources and also not confuse with the ID, we 
introduce a new definition, referred to as virtual dimensionality (VD) in this chapter. It is 
defined by the minimum number of spectrally distinct signal sources that characterize the 
hyperspectral data from a perspective view of target detection and classification rather 
than the image endmembers defined in Schowengerdt (1997) which are idealized, pure 
signatures. These signal sources may include known and unknown image endmembers, 
natural signatures, anomalies and interferers (Chang et al. 1998; Chang and Du, 1999). 
The definition of VD used here has a very narrow implication driven by the need of target 
detection and classification. Based on this new definition, three methods are presented to 
estimate VD. 

Recently, Harsanyi, Farrand and Chang developed an eigen-thresholding method, 
referred to as Harsanyi-Farrand-Chang (HFC) to determine the number of spectral 
endmembers in AVIRIS data (Harsanyi et al., 1997). It was derived from the Neyman- 
Pearson detection theory (Poor, 1994). Their idea can be briefly described as follows. Let 
the eigenvalues of the sample correlation matrix and the sample covariance matrix be 
denoted by correlation-eigenvaules and covariance-eigenvalues respectively. Since the 
component dimensionality is equal to the total number of eigenvalues, each eigenvalue 
specifies a component dimension with a certain level of significance provided by that 
particular component in terms of energy or variance. If there is no signal source contained 
in a component, its corresponding correlation-eigenvalue and covariance-eigenvalue 
should reflect only noise energy, in which case, both correlation-eigenvalue and 
covariance-eigenvalue must be equal. This fact allows us to formulate the difference 
between the correlation-eigenvalue and the covariance-eigenvalue as a binary composite 
hypothesis testing problem where the null hypothesis represents the case of the zero- 
difference while the alternative hypothesis being the case that the difference is greater than 
zero. When the Neyman-Pearson test is applied to each pair of a correlation-eigenvalue 
and its corresponding covariance-eigenvalue, the number of times the test fails indicates 
how many signal sources are present in the image. In other words, a failure of the 
Neyman-Pearson in a component implies a truth of the alternative hypothesis, which 
indicates that there is a signal source in this particular component. Using this approach 
we can estimate VD where the receiver operating characteristics (ROC) analysis can be 
used to evaluate the effectiveness of the decision. Since the HFC method does not 
decorrelate the data, an alternative approach is to include a noise whitening process in the 
HFC method to remove the second-order statistical correlation prior to the use of the 
HFC method. The resulting method will be referred to as noise- whitened HFC 
(NWHFC) method. In this case, noise estimation is required for the NWHFC method. 

An important fact used by the HFC and NWHFC methods was that the sample size 
must be sufficiently large so that the covariance between the correlation-eigenvalue and 
the corresponding covariance-eigenvalue is asymptotically zero. However, this may not 
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be valid for a small sample size. In order to alleviate this problem, a third method, 
referred to as noise subspace projection (NSP) approach was recently proposed in Du 
(2000) and Du and Chang (1999). It can be viewed as an extension of HFC via NWHFC. 
It first estimates the noise covariance matrix of image data, then uses it to whiten the 
noise variances to unity. The binary composite hypothesis problem used in the HFC 
method can be then simplified and reduced to that the null hypothesis is specified by the 
case of unity variance while the alternative hypothesis is the case that variance is greater 
than unity. As a result, calculating both correlation-eigenvalues and covariance- 
eigenvalues can be avoided. An immediate benefit from the NSP method is that the 
sample size is not necessarily large as required by the HFC and NWHFC methods. In 
order to estimate the noise covariance matrix, three methods will be considered. Two 
were suggested by Roger and Arnold (1996) and the third was recently developed by Du 
and Chang (1999) which was used to estimate the number of hidden nodes for a radial 
basis function neural network. 

One advantage resulting from a noise whitening process is to enable us to examine 
applicability of two well-established information theoretic criteria, an information 
criterion (AIC), minimum description length (MDL) to VD determination. The AIC and 
MDL have been widely used in passive sensor array processing to determine the number 
of signals impinging upon the sensors. However, a crucial requirement for AIC and MDL 
to be effective is that the noise must be independent identically distributed (i.i.d). It is 
our hope that a noise whitening process prior to the AIC and MDL may help relax this 
limitation. Unfortunately, as will be demonstrated by experiments, the noise whitening 
process is not good enough. In fact, AIC and MDL require more than second-order 
independency to be effective. 



17.2 NEYMAN-PEARSON DETECTION THEORY-BASED EIGEN- 
THRESHOLDING ANALYSIS (HFC METHOD) 

A Neyman-Pearson detection theory-based eigen-threshold method, referred to as 
HFC method, was previously developed by Harsanyi et al. (1994) to determine the 
number of endmembers in AVIRIS data. It first calculated the sample correlation matrix, 
R^^^ and sample covariance matrix then found the difference between their 

corresponding eigenvalues. Let and > A, >••■> A^} be two sets 

of eigenvalues generated by R^^^ and called correlation eigenvalues and covariance 
eigenvalues respectively. Assume that signal sources have positive energies and noise 
variance in band / is given by cr," . We can expect that 

A, - A, >0 for / = 1,2,---,VD, (17.1) 

and 

A, -A, =0 for /-VD + !,•••, L (17.2) 

where 



A^ = A, + a; for / = 1,2,---,VD and A, = A^ - a; for I - VD + !,•••, L. (17.3) 
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In order to determine VD, Harsanyi, Farrand and Chang (1994) formulated the 
problem outlined by (17.1) and (17.2) as a binary hypothesis problem as follows. 

z,=A,-A,=0 

versus for / = (17.4) 

H^: z, =A, -A, >0 

where the null hypothesis and the alternative hypothesis represent the case that 
the correlation-eigenvalue is equal to its corresponding covariance eigenvalue and the case 
that the correlation-eigenvalue is greater than its corresponding covariance eigenvalue 
respectively. In other words, when is true (i.e., fails) for some component /, it 
implies that there is a signal source contributing the correlation-eigenvalue in this 
particular component / in addition to noise. As a result, the /-th eigenvalue in is 
greater than the corresponding /-th eigenvalues in . 

According to Anderson (1984), we can model each pair of eigenvalues, and A, in 
(17.1)-(17.3) under hypotheses and //, as random variables with their asymptotic 
conditional probability densities given by 



Poiz,) = p{z, I //,) = A(0,a' ) for / = I,2,--*,L 



(17.5) 



and 



PMi) = P^z, |//^)-/V(/2,,ctJ ) for / = 1,2,---,L (17.6) 

respectively where is an unknown constant and the variance is given by 

crj = Var[X,-X,] = a\ + - 2Cov(X,,X;) for / = (17.7) 

with and g\^ given by (Anderson, 1984) 



a- =Var[X,] = ^ 
N 


(17.8) 


= Var[X,] = 


(17.9) 



and N is the total number of samples. Since Cov(A^,A,) in (17.7) is bounded by 
Schwarz's inequality 
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Cov(i, , 1, ) < ■^VarlX,]VarlA,] = (2 / N)[X, ■ A,) , (17.10) 



Cov(A ^ , ) — > 0 as — > oo . This results in 



= cr? + (T^ = • 



2A] 

N N 



as N - 



(17.11) 



(17.11) implies that is asymptotically equal to sum of and oj , and can be 

approximated by asymptotically. 

From (17.4)-(17.6) and (17.11), we define the false alarm probability and detection 
probability (detection power) as follows: 



Pf =lPo(^)dz 



(17.12) 



Pp = lp,(^)dz. 



(17.13) 



Using (17.4), (17.12)-(17.13), the maximum detection probability in determining 
VD for any given false alarm probability is given by the Neyman-Pearson 
detector, 5^^ . 



17.3 ESTIMATION OF NOISE COVARIANCE MATRIX 

As noted in (17.8), Cov(A^,X^) may not be zero when the sample size is small. In 
this case, we consider an alternative approach which only requires computation of R^^^ 
or , but not both. The trade-off is that we need to know the noise second-order 

statistics to be used for whitening. 

There exist many methods to estimate the noise covariance matrix, such as residual- 
based estimation (Roger and Arnold, 1996) taking into account intra-band correlation, 
nearest neighbor difference (NND) (Green et al, 1988; Lee et al. 1990) using inter-band 
correlation, and linear regression model-based prediction (Roger and Arnold, 1999) 
taking advantage of intra/inter-band correlation. Since NND has shown to be a poor 
estimation (Chang and Du, 1999), only residual analysis and linear regression model- 
based prediction methods developed by Roger and his colleagues (Roger, 1999; Roger 
and Arnold, 1999) will be discussed in this chapter for noise covariance matrix 
estimation. 

17.3.1 Residual Analysis (Roger, 1996) 

Assume that there a hyperspectral image is acquired by L spectral bands. Q. is 
indeed an image cube made up of L band images, • As a result, each pixel vector 
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in a Z-band multispectral/hyperspectral image cube Q is denoted by an L-dimensional 
column vector x^. with x^. being the pixel at the spatial 

location (ij) in Let be the sample covariance matrix given by 

^LxL=^ Z?=iSy=i(x^. .^ -|Li)(x^.^ -p) I where p is the sample mean given by 

p = (1/ IJ) SJ=iSy=iX^.^.J and 1/ is the total number of pixels in the image. Then 
can be decomposed as follows. 

K,.,=D,R,D, (17.14) 

where is a diagonal matrix given by crj , ■ • • , cr^ } with being 

diagonal elements of a' is the variance of and 



R 



K 



^ Pl2 Pl3 P\L 

P2I ^ P23 ’ ' ’ PlL 

P2I P2I '*• : 

P{L-\)L 

Pl\ Pl2 ’ Pl{L-\) ^ 



(17.15) 



with being the correlation coefficient at the (m,/i)-th entry of and n. 

Similarly, in analogy with decomposition of its inverse can be also 

decomposed as 



K7, -D R ,D 

Lx A rr 



(17.16) 



where is a diagonal matrix given by D^_, = with being the 

diagonal elements of and 



R 

K 



1 ^,3 - 

1 ^3 - ^3. 

^3, ^3, : 

^(L-l)L 

L ^33 ^3,3-1, 1 



(17.17) 



with being the correlation coefficient at the (m,n)-th entry of K ’ and n. It turns 
out that g^ in D^_, can be related to by the following relation 



(17.18) 
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where is the multiple correlation coefficients of band on the other L - 1 bands 
{ 5 ^}^ ^ obtained by using the multiple regression theory. So, is the reciprocal of a 
good noise variance estimate of band It should be noted that the D^_, in (17.16) is 
not an inverse of the in (17.14), nor is R^_, in (17.16). The major advantage of 
using over <7^ is that as shown in (17.18), g^ removes its correlation on other g^'s for 
I ^ k while does not. 

17.3.2 Inter/Intra-Band Prediction Noise Estimation: Spatial/Spectral Prediction 
Noise Estimation (Roger and Arnold, 1996) 

Let be the pixel at the spatial location (/,_/) in B^ ^ which is the k-th block 

of the /-th band B ^ . We use a linear regression model to describe the intra/inter-band 
correlation as followed. 



(17.19) 



where a^, are the linear regression coefficients need to be determined. 

Now we define the residual error of estimated by v, via (17.19) is given 

by 



So, the residual error of B^ ^ is 

2 2 



(17.20) 



(17.21) 



In order to estimate the noise variance of B^^, Roger and Arnold (1996) proposed a 
method to estimate (17.21) subject the constraint that the mean of the residual errors 
incurred within the block must be zero, i.e., ^ = 0. The resulting constrained 

solution was obtained by 

( 17 . 22 ) 

where N(B^ is the total number of pixels within B^^. 

By means of (17.22) we can further estimate the 6variance of B^ by 

a' = mean of or a] - median of . (17.23) 

Since the noise is generally random, the band-to-band noise correlation can be 
assumed to be uncorrelated or weak, the noise covariance matrix can be estimated by 
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a diagonal matrix with the (/,/) diagonal element given by in (17.21), that is, 



17.4 NOISE ESTIMATION-BASED EIGEN-THRESHOLDING 



In many signal detection applications, a common practice is noise whitening which 
generally improve and enhance signal detectability. In order for a noise whitening process 
to be effective, the noise second-order statistics must be estimated accurately or at least 
reliably. The techniques introduced in Section 17.3 have shown success in estimation of 
noise covariance matrix of AVIRIS data and can be used for this purpose. 

17.4.1 Noise-Whitened HFC (NWHFC) Method 

The HFC method described in Section 17.2 does not have a noise whitening 
process. In this case, weak signal sources may be obscured by noise and cannot be 
detected. As an alternative, the HFC method can be modified by including a noise 
whitening process as a preprocessing to remove the second-order statistical correlation so 
that signal sources can be decorrelated to achieve better signal detection. The resulting 
HFC method will be referred to as noise- whitened HFC (NWHFC) method. 



17.4.2 Noise Subspace Projection (NSP) 



The effectiveness of the HFC and NWHFC methods relies on (17.8), which 
guarantees that Cov(A^,A^) — > 0 as A — > oo. However, for the case that the number of 
samples is not sufficiently large, Cov(A^,A^) may not be zero. So, in order to resolve 

this problem, a noise subspace projection (NSP) is developed in this section. It can be 
viewed as a hybrid of the HFC and NWHFC methods. 

If we assume that the number of spectrally distinct signal sources is VD, the data 
sample covariance matrix, can be then represented by where 

is the covariance matrix of signals present in the image data. Using the eigen- 
decomposition of (Poor, 1994) we can express as 



VD / \ L 

K,., = S(a, + cr")u,u; + S cr; u,u[ 



(17.24) 



where and {u,};tvD+i represent two sets of orthonormal vectors to span signal 

space and noise space respectively and VD is the virtual dimensionality. The A, and 

in (17.24) correspond to the energies of signals and noise in the /-th component (i.e. l-th 
band) image respectively. The only problem of (17.24) that remains unsolved is to find 
VD. 

In the HFC method, it used difference between correlation-eigenvalues and 
covariance-eigenvalues as a criterion to determine VD. In this section, we take advantage 
of the eigen-decomposition of and premultiply in (17.24) by the diagonal 

matrix . This results in 
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/ a] + l)u,u,^ + StvD„u,u[ (17.25) 

where the variances of the noise components have been whitened and normalized to 
unity. If we further assume that the energies of signals is greater than noise energies, that 
is, A; > a] for I = l,2,-*-,VD, (17.4)-(17.6) become the following binary composite 
hypothesis testing problem 

H,-. z, = = 1 



versus for I = (17.26) 

H^ : z, = /i, = / O'" -f 1 > 1 

where 

p^iz,) = p{z,\H„)s Nilal) for 1 = (17.27) 

and 

= ) for /-1,2,-.-,L (17.28) 

with a" given by (17.9), i.e., cr" = crl ^ . 

Now using (17.26)-(17.28) we can derive the Neyman-Pearson detector 5^^ for 
(17.26), which can be used as a decision to determine VD. 

17.4.3 AIC AND MDL 

In Chapters 3, 8-9 and 15, subpixel detection and mixed pixel classification 
problems were formulated as linear mixing problems which could be solved by using a 
linear mixture model specified by (3.1) and (15.2). This approach requires the knowledge 
of the target signature matrix M. Despite that the unsupervised algorithms proposed in 
Chapter 5 can be used to find target signatures directly from the image data, they still 
need to know how many target signatures should be generated. This dilemma can be 
resolved by estimation of the VD. In the passive sensor array processing, a similar 
problem also arises from estimation of how many signal sources impinging upon the 
array. This problem is related to how to select an appropriate model for a parameterized 
family of probability density functions used to best fit the sensor array data. It is known 
that two commonly used methods, AIC (An Information Theoretic Criterion) suggested 
by Akaike (1973) and MDL (Minimum Description Length) proposed by Schwartz 
(1978) and Rissanen (1978) can be used for model selection. They can be derived from 
the following formulas that were obtained by Wax and Kailath (1985) 

AIC(p) = - 2 log(n;-.,., /[(!/ L - 

+ 2p{2L - p) 



(17.29) 
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MDL(p) = - / [(1 / L - 

+ (l/2)p(2L-p)logiV 

where p is the number of free parameters that specifies a family of probability density 
functions, > A, >■••> are eigenvalues generated by the sample covariance matrix 

given by = (l / iV)S" , r,r . 

In Chapter 4 we described that a linear unmixing problem could be formulated as a 
passive sensor array problem by interpreting signal sources impinging on a sensor array 
as desired target sources to be detected from a set of co-registered images acquired by a 
bank of spectral channels. With this interpretation a linear mixing problem can be solved 
by techniques that solves sensor array problems. In order to take advantage of these two 
criteria and apply them to the estimation of the VD, we need to examine their underlying 
assumptions, (1) the noise must be an independent identically distributed (i.i.d.) process 
and (2) the observation process is a zero-mean Gaussian random process. Part of the first 
assumption can be taken care of by the whitening technique proposed in Section 17.3, in 
which case the noise variance will be normalized to unity. Then the resulting sample 
covariance matrix can be used in the Gaussian process made in the second assumption. 
Now, the problem of estimating VD can be solved by minimizing (17.29) and (17.30) as 
follows. 



= arg{min^AIC(p)} 


(17.31) 


'VDmdl = arg{min^MDL(p)}. 


(17.32) 



17.5 COMPUTER SIMULATIONS AND HYPERSPECTRAL IMAGE 
EXPERIMENTS 

The experiments conducted in this section consist of computer simulations and real 
hyperspectral image experiments. The purpose of these experiments is to demonstrate the 
ability of the methods presented in this chapter and two information theoretic criteria, 
AlC and MDL in estimation of VD. As will be shown, the AlC and MDL generally do 
not work effectively if the assumption that the noise is i.i.d. is violated. 

17.5.1 Computer Simulations 

The data set contains five field reflectance spectra, dry grass, red soil, creosote 
leaves, blackbrush and sagebrush shown in Fig. 1,5. Using the spectra in Fig. 1.5 each 
spectral signature was used to 1000 simulated pixels with abundance generated by 
random numbers in the range (0,1), then they were mixed to form 1000 simulated mixed 
pixels. Additive Gaussian noise was added to achieve signal-to-noise ratio 25:1 for 
entire data. In order to simulated an i.i.d. case, the same Gaussian variance was added to 
each band. Fig. 17.1(a)-(c) shows the log eigen-spectra generated by the HFC, the 
NWHFC and NSP-based eigen-thresholding methods respectively where the eigenvalues 
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produced by and are denoted by "*”s and " o"s respectively. As noted in Fig. 
17.1, when the eigenvalue index increases, the correlation-eigenvalues become to overlay 
the covariance-eigenvalues. This phenomenon can be explained by the fact that noise is 
the only signal sources contributed to these eigenvalues. With this interpretation the VD 
was estimated by the HFC, the NWHFC and NSP-based eigen-thresholding methods to 
be 5, 4 and 4 respectively using the Neyman-Pearson test with the false alarm 
probability set to P^ = 0.001. The FIFC eigen-thresholding method correctly 
predicted the true data dimensionality, which is 5, while both NWHFC and NSP-based 
eigen-thresholding methods predicting 4. Since the spectra of creosote leaves and 
blackbrush shown in Fig. 1.5 are so close, they were considered to be the same signature 
by the NWHFC and NSP methods. 




Eigenvalue index EigemaluB index Eigenvalue index 

(a) HFC (b) NWHFC (c)NSP 

Figure 17.1. Computer simulation with 5 AVIRIS signatures 



This is also true for the AIC and MDL where their VD was estimated to be 4 as 
shown in Fig. 17.2(a-b). 





(a) AIC 

Figure 17.2. Computer simulation with 5 AVIRIS signatures with i.i.d. Gaussian noise 



Unlike the above simulation with an equal noise variance added to each individual 
band, we simulated another scenario in such a way that additive Gaussian noise with a 
different variance was added to each individual band to achieve the fixed signal-to-noise 
ratio (SNR) = 25:1. In this case, the additive Gaussian noise in each band was white but 
not identically distributed from band to band. Interestingly, the values of VD estimated 
by the three eigen-thresholding based methods still remained unchanged, but the AIC and 
MDL completely failed as shown in Fig. 17.3, in which their estimated VD was 139 and 
110 respectively. The experiment of the second simulation demonstrates that the i.i.d 
assumption is crucial to the success of the AIC and MDL, while it has very little effect 
on the eigen- thresholding based methods. 
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Table 17.1 summarizes the VD results estimated by the HFC, NWHFC, NSP, AIC 
and MDL for the two scenarios of simulated Gaussian noise, i.i.d. and non-i.i.d. cases. 



Table 17.1. Estimation of VD 





HFC 


NWHFC 


NSP 


AIC 


MDL 


i.i.d. Gaussian noise (T* scenario) 


5 


4 


4 


4 


4 


non-i.i.d Gaussian noise (2""* scenario) 


5 


4 


4 


139 


no 



Similar experiments were also conducted for various SNRs. It was found that when 
the SNR is high and greater than 25:1, the VD results remained the same, 5, 4, 4, 4, 4 
for the HFC, NWHFC, NSP, AIC and MDL methods respectively. However, if the SNR 
is lower than 25:1, the VD estimated by the five methods, HFC, NWHFC, NSP and 
MDL was the same, 3 except for the AIC which yielded 3 for SNR 10:1 and 4 for SNR 
greater than 10:1. Table 17.2 summarizes the results. 



Table 17.2 . Estimation of VD for various SNRs 





HFC 


NWHFC 


NSP 


AIC 


MDL 


SNR= 10:1 


3 


3 


3 


3 


3 


SNR = 15:1 


3 


3 


3 


4 


3 


SNR = 20:1 


4 


3 


3 


4 


3 


SNR = 25:1 


5 


4 


4 


4 


4 


SNR = 30:1 


5 


4 


4 


4 


4 


SNR = 35:1 


5 


4 


4 


4 


4 


SNR = 40:1 


5 


4 


4 


4 


4 



This makes sense. Due to increased noise interference the blackbrush becomes 
indiscernible from creosote leaves and sagebrush as shown in Chang (2000), Du (2000) 
and Du and Chang (2000). In this case, the three signatures will be considered to belong 
to one class and this class along with dry grass and red soil makes up three distinct 
classes in computer simulations. 

17.5.2 AVIRIS and HYDYCE Image Experiments 

In analogy with computer simulations, the AVIRIS image in Fig. 1.6 was used for 
experiments. The log eigen-spectra generated by the HFC, NWHFC and NSP-based 
eigen- thresholding methods are shown in Fig. 17.4(a-c) respectively with "*"s and "o"s 
denoted by the eigenvalues produced by and Once again, using the Neyman- 

Pearson test with the false alarm probability set to =0.001, the VD predicted by 
the HFC, NWHFC and NSP-based eigen-thresholding methods were 4, 5 and 8 
respectively. 
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(a) HFC (b) NWHFC (c) NSP 

Figure 17.4. AVIRIS image 



As shown in Fig. 5.9 the unsupervised classification results produced by the 
UNCLS-OSP classifier detected at least 5 target signatures, cinders, playa, rhyolite, 
shade, vegetation in the image scene plus an anomaly at the upper edge of the dry lake. 
This implies that the VD must be equal or greater than 6. As a matter of fact, due to a 
wide coverage of the dry lake, the playa signature varied significantly from its top to 
bottom. Also, according to the results shown in Fig. 15.8, three areas within the dry 
lake, the edge, center and bottom regions were found to have different playa signatures. If 
these three signatures are considered to be different signal sources, the VD turns out to be 
8 which is exactly the same value estimated by the NSP method. When the AIC and the 
MDL were further applied to the AVIRIS image scene, the estimated VD was 150 for 
AIC and 121 for MDL shown in Fig. 17.5. These values are significantly higher than the 
numbers estimated by the three eigen-thresholding based methods. 




(a) AIc" (N'mDL 

Figure 17.5. AVIRIS image 



As illustrated in computer simulations, due to the fact that the noise in remotely 
sensed imagery is generally not i.i.d. as usually assumed in array processing, the VD 
estimated by the AIC and the MDL for the AVRIS image was largely affected by this 
assumption. Consequently, the AIC and MDL performed poorly in the estimation of 
VD. 

Similar experiments were also conducted for the 15-panel HYDICE image in Fig. 
1.7 using the HFC, the NWHFC and the NSP-based eigen-thresholding methods. Fig. 
17.6(a-c) shows the distributions of the log eigenvalues with "*"s and "o"s produced by 
and respectively. 




(5'roG (b)l?WHFC (7)nTp 

Figure 17.6. 15-panel HYDICE image 
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Once again, the VD predicted by the HFC, NWHFC and NSP-based eigen- 
thresholding methods were 13, 15 and 20 with the false alarm probability set to 
Pp =0.001. As shown in the experiments in Example 15.5 of Section 15.4.2, these 
values were all adequate because all the 1 5 panels could be detected and separated in the 
fist 6 components with p = VD+1. However, in order for Nlsrma to be stable, the Nlsrma 
obtained for this 15-panel HYDICE scene was shown to be 17 for skewness or 18 for 
kurtosis which are very close to VD = 20 estimated by the NSP method. If we further 
examine the unsupervised classification results obtained by the UTGP in Fig. 5.17, the 
number of target signatures required to detect and separate the 1 5 panels was shown to be 
20. These two experiments demonstrate that the NSP method produces a very good 
estimate for the VD. 

If the AIC and the MDL were used, the estimated VD was 124 and 85 respectively 
shown in Fig. 17.7. Once again, the AIC and the MDL overestimated the VD. The real 
hyperspectral image experiments demonstrate that the AIC and the MDL may not be 
appropriate criteria to be used to estimate the VD of remotely sensed images. 





(a) AIC 

Figure 17.7. Estimation of VD by (a) AIC and (b) MDL 



Table 17.3 summaries VD results obtained by the HFC, the NWHFC, the NSP- 
based eigen-thresholding methods, the AIC and the MDL for AVIRIS and HYDICE 
images. 



Table 17.3. Estimation of VD for real hyperspectral images 





HFC 


NWHFC 


NSP 


AIC 


MDL 


AVIRIS image 


4 


5 


8 


150 


121 


15-panel HYDICE image 


13 


15 


20 


124 


85 



As a final comment, the value of VD only provides an estimate of how many 
spectrally distinct signal sources assumed to be in the image data. It is determined by 
intrinsic spectral properties not by a specific unsupervised method. For example, when 
the UNCLS-OSP method was applied to this 15-panel scene, 34 target signatures were 
required to generate in order to separate the 15 panels into five different target classes 
shown in Fig. 5.20. However, as shown in Fig. 5.17, only 20 target signatures were 
needed for UTGP-OSP to correctly classify these 15 panels into their own classes. As we 
can see from Fig. 5.20, most of 34 target signatures are either natural or background 
signatures. Some of these signatures may belong to the same signal sources, but were 
forced to split by the algorithms. So, the number of spectrally distinct signal sources is 
generally fewer than 34. Figs. 15.19-15.20 in Chapter 15 also confirms this. 
Interestingly, it was found that if the UTGP-OSP was applied to the AVIRIS scene, 12 
target signatures were required to classify all five target signatures, cinders, rhyolite, 
shade, shade and vegetation shown in Fig. 13.5. This number was higher than 8 that was 
required for both the UNCLS-OSP and the LSRMA to classify all the five target 




ESTIMATION FOR VIRTUAL DIMENSIONALITY OF HYPERSPECTRAL IMAGERY 



333 



signatures in Figs. 5.9 and 15.8 respectively. These real AVIRIS and HYDICE image 
experiments demonstrate a fact that a specific unsupervised algorithm cannot be used to 
determine the VD. It requires methods, which are particularly designed for this purpose. 
The three methods, HFC, NWHFC and NSP methods proposed in this paper seem to fit 
this need. 



17.6 CONCLUSIONS 

Determination of VD is a difficult and challenging problem. This chapter presents 
three Neyman-Pearson detection theory-based eigen-thresholding methods. The key idea 
of these methods is to formulate the VD determination as a Neyman-Pearson detection 
problem. The first approach was proposed by Harsanyi, Farrand and Chang (1994) which 
cast the eigenvalue difference between the sample correlation matrix and sample 
covariance matrix as a binary composite hypothesis testing problem with the Neyman- 
Pearson detector used as a binary decision maker. An alternative approach is also 
proposed, noise-whitened the HFC (NWHFC) method which incorporates a whitening 
process to decorrelate the sample correlation matrix. These two approaches assume that 
the sample size must be sufficiently large so that the covariance between eigenvalues of 
the sample correlation matrix and the sample covariance matrix will converge to zero. In 
order to alleviate this assumption, a third approach, noise subspace projection (NSP) 
method is further developed which can be derived from NWHFC. It makes use of an 
NSP-based whitening process prior to the Neyman-Pearson detection. Instead of 
comparing correlation-eigenvalues and covariance-eigenvalues as done in HFC and 
NWHFC, the NSP method only needs to deal with only the sample correlation matrix. 
The AIC and MDL have been widely used in passive sensor array processing to estimate 
the number of signals impinging on the array. In order to see their utility in estimation 
of VD for remotely sensed images, they were also included for comparison. For real 
hyperspectral image experiments, the results showed that all the three eigen-thresholding 
based methods, the HFC, NWHFC and NSP methods produced very close values of VD 
while the AIC and MDL over-estimating VD significantly. This is because that the noise 
in remote sensing images is generally not i.i.d., which is a crucial assumption for AIC 
and MDL to be effective criteria. As a final comment, the methods proposed in this 
chapter only provide an estimate of number of spectrally distinct signal sources assumed 
to be present in image data, which is generally greater than ID. 
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As a final chapter, we will highlight several features that are considered to be unique 
in this book and conclude with a number of interesting techniques for which we are 
unable to cover in this book. The most unique feature is a wide coverage of statistical 
signal processing techniques that are applied to subpixel detection and mixed pixel 
classification with various functional taxonomies. On one end, the techniques are 
categorized according to the level of target knowledge- supervised, unsupervised and 
automatic. On another end, the techniques are classified according to constraints imposed 
on targets of interest-target abundance constraint and target signature constraint. A second 
unique feature is inclusion of sufficient details of mathematical derivations of all the 
presented techniques that make this book self-contained. In addition, all the relevant 
references are also included where they are necessary. A third feature is extensive 
experiments accompanied with comparative study and quantitative analysis to illustrate 
each described technique. A fourth unique feature is an introduction of a 3-D ROC 
analysis that can be used to evaluate detection and classification performance. A fifth 
unique feature is study on issues of sensitivity to level of target knowledge and noise, 
which play a significant role in design of techniques. Finally, another unique feature is 
demonstration of ability of the presented techniques in real-time processing through 
experiments. 



18.1 FUNCTIONAL TAXONOMY OF TECHNIQUES 

The techniques presented in this book can be categorized according to their 
functionalities, subpixel detection and mixed pixel classification. In general, it is 
difficult to draw a line between subpixel detection and mixed pixel classification. A 
subpixel detection technique can be used for mixed pixel classification, in which case it 
must be implemented one for each target class. In this case, a stack of images, each of 
which shows detection results of a different target, can be used for classification. On the 
other hand, a mixed pixel classification technique can be also used for target detection 
without classification. Therefore, a logical presentation for this book is to first consider 
subpixel detection in various aspects, and then expands subpixel detection to mixed pixel 
classification. The first subpixel detection technique considered is the orthogonal 
subspace projection (OSP), which is a supervised and unconstrained technique. This 
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technique is then extended to two supervised target-abundance constrained techniques, 
sum-to-one constrained least squares (SCLS) and non-negatively constrained least squares 
(NCLS) methods, which impose different constraints on the abundance fractions of 
targets of interest. Since target abundance-constrained techniques require the complete 
prior target knowledge of the image data, another type of constrained techniques is 
developed to reduce the level of required target knowledge. Two supervised target 
signature-constrained techniques are presented, which only require the knowledge of 
targets of interest with no need of knowing image background. They are constrained 
energy minimization (CEM) and target-constrained interference-minimized filter 
(TCIMF), both of which impose constraints on vector directions of target signatures 
rather than target abundance fractions. In order to eliminate dependency on target 
knowledge, automatic subpixel detection is further considered. Two types of automatic 
subpixel detection are of interest. One is unsupervised subpixel detection, which 
automatically generates necessary target information directly from the image data by an 
unsupervised means. Such obtained unsupervised information is referred to as a 
posteriori target information. Three algorithms are introduced, which are unsupervised 
vector-quantization (UVQ), unsupervised target generation process (UTGP) and 
unsupervised least squares error-based algorithm. Another is anomaly detection, which 
does not require any a priori or a posteriori target information. Two anomaly detectors, 
RX detector (RXD) and low probability detector (LPD) along with their variants are 
studied. Since the a posteriori target information is obtained from the image data, it may 
not be accurate. In this case, the sensitivity to target knowledge and noise must be 
addressed. Unfortunately, this issue has not been investigated in the past. So, as a 
conclusion of subpixel detection, the sensitivity problem is further considered. 

Following an analogous approach to subpixel detection, mixed pixel classification is 
presented in the same manner. It first begins with supervised and unconstrained mixed 
pixel classification in which a family of OSP-based classifiers is derived. It extends the 
OSP detector to an OSP classifier using a least-squares subspace projection approach. 
Three a posteriori OSP classifiers, signature subspace projection (SSP) classifier, target 
subspace projection (TSP) classifier and oblique subspace projection (OBSP) classifier 
are developed. A detailed experiment-based comparative study and analysis among these 
least-squares subspace projection based classifiers is also conducted. In order for mixed 
pixel classifiers to further perform material quantification, supervised fully-constrained 
mixed pixel classification is considered which imposes two constraints on target 
abundance, abundance sum-to-one constraint (ASC) and abundance non-negativity 
constraint (ANC). Two methods are developed for finding solutions numerically, fully 
constrained least squares (FCLS) method and modified fully constrained least squares 
(MFCLS) method, both of which include the SCLS solution as an initial estimate to 
produce solutions. In analogy with subpixel detection, supervised target signature- 
constrained mixed pixel classification techniques are also developed to extend CEM and 
TCIMF subpixel detection techniques to mixed pixel classification, which are linearly 
constrained minimum variance (LCMV) classifiers. It is known that Fisher’s linear 
discriminant analysis has been widely used as one of most successful pattern 
classification techniques. This technique is further explored for hyperspectral data 
exploitation and extended to the linearly constrained discriminant analysis (LCD A) where 
the directions of the obtained linear discriminants must be aligned with mutual 
orthogonal directions. 

Like subpixel detection, mixed pixel classification can be also carried out without a 
priori target knowledge as unsupervised mixed pixel classification and anomaly 
classification. Two algorithms are developed for unsupervised mixed pixel classification. 
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which are desired target detection and classification algorithm (DTDCA) for 
reconnaissance applications, and automatic target detection and classification algorithm 
(ATDCA) for surveillance applications. Anomaly classification is a hybrid of anomaly 
detection and classification. It requires an additional measure for target discrimination 
bridging the gap between detection and classification. In addition to unsupervised mixed 
pixel classification and anomaly detection, a new set of mixed pixel classification 
techniques that are not considered in subpixel detection is projection pursuit-based 
techniques where neither a priori nor a posteriori target knowledge is required for 
classification. They use higher orders of statistics to detect and classify interesting targets 
in the image. Two approaches are developed along this line. One is a Kullback-Leibler 
information distance-based projection pursuit that use third-order and fourth-order 
statistics for automatic mixed pixel classification. Another is an independent components 
analysis-based linear spectral random mixture analysis (LSRMA). It can be considered as 
a random version of the linear spectral mixture analysis where the targets are assumed to 
be random signal sources. One major issue for automatic mixed pixel classification is 
how many targets are present in the image dada. This is closely related to determination 
of the intrinsic dimensionality (ID) of the image data. So, as a final topic in the book, 
we introduce a new definition of virtual dimensionality (VD), which is the number of 
spectrally distinct target signatures resident in the image data. Three eigen-threshold 
based techniques are also derived to determine appropriate VD for given image data. 



18.2 MATHEMATICAL TAXONOMY OF TECHNIQUES 

The techniques discussed in this book can be categorized into five groups, (1) linear 
mixture model-based methods, (2) linearly constrained minimum variance-based 
methods, (3) maximum likelihood estimation-based methods, (4) discriminant analysis- 
based methods and (5) projection pursuit-based methods. The first group of techniques 
consists of OSP and least-squares approaches, which include unconstrained or constrained 
methods. These are OSP, SSP, TSP, OBSP, SCLS, NCLS, FCLS and MFCLS 
methods discussed in Chapters 3, 8-10, 13 and 15. The techniques in this group require 
the complete target knowledge for the linear mixture model, which can be obtained in 
either a supervised or an unsupervised fashion. They take advantage of spectral 
information and correlation provided by the target signature matrix to achieve subpixel 
detection and mixed pixel classification. The second group is discussed in Chapters 4 
and 11, and is made of CEM, TCIMF and LCMV methods, which constrain targets of 
interest while minimizing the energy of unknown signal sources. So, the techniques in 
this group do not need to know image background. The only required knowledge is the 
targets of interest, which can be also obtained a priori or by an unsupervised means. 
Since they do not have full knowledge of targets in the image data, they make use of 
spectral correlation among all the image pixel vectors to approximate the information that 
could have been provided by undesired target signatures if the complete target knowledge 
was given as the case for the first group of techniques. The third group contains the 
Gaussian maximum likelihood (GML) classifier in Chapter 8 and the RXD discussed in 
Chapter 6 and Chapter 14. It takes advantage of a Gaussian form to derive analytical 
optimal linear filters. The difference between these two techniques is that the GML 
classifier only uses the second-order band-to-band spectral correlation within a single 
image pixel vector, whereas RXD uses the second-order spectral correlation among image 
pixel vectors, i.e. sample spectral correlation matrix formed by all the image pixel 
vectors in the image data. Surprisingly, the GML classifier can be shown to be the 
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OBSP classifier if the noise in the linear mixture model is assumed to be white 
Gaussian. The techniques in fourth group are the filter-vectors (FV) classifier in Chapters 
11-12 and the linearly constrained discriminant analysis (LCD A) classifier in Chapter 12. 
Interestingly, all the techniques in the first three groups produce linear filters, which 
operate on a matched filter form using different matched signatures. In particular, the 
LCD A classifier can be shown to be a constrained OSP classifier. The OSP and the FV 
classifiers can be derived as a whitened CEM in Chapter 4 (Chang, 2002a) and a 
whitened BRLCMV classifier in Chapter 1 1 (Chang, 2002b) respectively, in which case 
the image data are zero-mean and whitened. Recently, an information-processed matched 
filter approach has been proposed in Chang (2003) to interpret and illustrate these 
techniques based on a two-stage filter process, information processing filter followed by a 
matched filter. A diagram showing a mathematical taxonomy of these techniques is 
delineated in Fig. 18.1. 
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Figure 18.1. Mathematical taxonomy of techniques 

In particular, several interesting techniques, OSP, CEM, LCMV, BRLCMV, 
LCDA, FV and RXD have been studied and analyzed in great depth. Their relationships 
have been also discussed in various places in the book, specifically, relationship among 
OSP, CEM, TCIMF in Sections 4.3 and 8.2.4, relationship between CEM and RXD in 
Section 6.4, relationship between BRLCMV and the FV method in Section 11.3, 
relationship between LCDA, the FV method in Section 12.4 and relationship between 
OSP and RXD in Chang (2002a, 2003). Fig. 18.2 summarizes various relationships 
among them. For example, LCMV is a very general target signature-constrained 
technique, which includes CEM, TCIMF, RXD, BRLCMV and the FV methods as its 
special versions, where the FV method can be considered a whitened version of 
BRLCMV. On one hand, OSP is a versatile technique, which can be extended in various 
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ways. For instance, LCD A and the FV method can be treated as constrained versions of 
OSP, where OSP can be shown as a whitened version of CEM via TCIMF. In other 
words, if no sample spectral correlation matrix, is involved in the filter design, 

OSP turns out to be a linear optimal filter. All other optimal filters can be considered as 
its variants. On the other hand, if the sample spectral correlation matrix, is used to 
account for spectral correlation among sample image vectors, LCMV may be used as a 
universal filter for optimal design which yields different optimal linear filters with 
specific constraints. 




Figure 18.2. Various relationships among OSP, CEM, LCMV, TCIMF, BRLCMV, LCD A, FV and RXD 

The techniques in the fifth group discussed in Chapters 15-16 are considered as high- 
order statistics methods, which go beyond the second-order statistics-based techniques 
studied in the first four groups. They utilize projection indices measured by statistical 
independence in Chapter 15, third and fourth moments in Chapter 16 to find interesting 
projections that can explore potential targets. The techniques in this group are particularly 
effective in detection of relatively small targets and anomalies. They can be thought of as 
a generalization of RXD, UTD and LPTD (Chang and Chiang, 2002), which use second- 
order statistics such as sample covariance/correlation matrix for anomaly detection. 



18.3 EXPERIMENTS 

A great number of experiments are conducted in this book to demonstrate and 
illustrate the utility of each presented technique. Two types of hyperspectral image data, 
AVIRIS and HYDICE are used for experiments. Although both data have the same 
spectral resolution of 10 nm, their spatial resolutions are different, 20 meters for AVIRIS 
and 1-4 meters for HYDICE. This difference allows us to examine more closely how 
spatial resolution affects the performance of spectral analysis-based non-literal techniques. 
One of nice features of the HYDICE data used in the experiments is that they come with 
ground truth maps that provide the precise spatial locations of all the targets pixels. By 
taking advantage of ground truth a set of custom-designed criteria can be developed for 
comparative analysis and quantitative study. In addition to these two data sets, which are 
radiance data, a set of AVIRIS laboratory reflectance data is also used for computer 
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simulations. This data set enables us to simulate various scenarios for performance 
evaluation. With these three data sets we can evaluate the strengths and weaknesses of 
each of the techniques presented in the book as well as conditions under which how these 
techniques perform. Such an assessment shall provide a useful guideline for their 
practical implementation. 



18.4 ROC ANALYSIS FOR SUBPIXEL DETECTION AND MIXED PIXEL 
CLASSIFICATION 

Receiver operating characteristic (ROC) analysis has been widely used for signal 
detection in many applications. An ROC curve is a plot of detection probability (power) 
versus false alarm probability, which can be used to evaluate the effectiveness of a 
detector. It is generated by a binary hypothesis testing problem with the null hypothesis 
and the alternative hypothesis representing noise and signal plus noise respectively. A 
signal is detected only if the alternative hypothesis is true. In order to calculate detection 
probability and false alarm probability, it requires knowing probability distributions 
under each hypothesis. In reality, these two probability distributions must be estimated. 
If the estimated probability distributions are not accurate, using the resulting ROC curves 
for analysis will be misleading. In subpixel detection and mixed pixel classification the 
produced images are gray scale with gray level values representing abundance fraction 
estimates. The detection and classification is then performed by visual inspection of these 
abundance fractional images. In order to apply the ROC analysis these abundance 
fractions images must be appropriately thresholded into binary images. However, such 
thresholding requires a likelihood ratio test. A general approach is to assume that the 
noise is Gaussian. Unfortunately, the Gaussian assumption is generally not valid in 
remote sensing images. Another alternative is to use well-established thresholding 
techniques available in image processing such as Otsu’s method (Otsu, 1979) entropy- 
based methods (Pal and Pal, 1989) etc. However, the techniques of this type are generally 
developed for pure-pixel images and may not be applicable to mixed-pixel remote 
sensing images. This book describes three techniques particularly designed to threshold 
gray scale abundance fractional images into binary images. Such thresholding is carried 
out by a process, called mixed-to-pure pixel conversion (MPCV), which converts mixed 
pixels in abundance fractional images to pure pixels. These three MPCV techniques are 
abundance percentage cut-off thresholding (fl%MPCV), winner-take-all thresholding 
(WTAMPCV) and probability distribution-based thresholding (y%MPCV). 

Since each particular threshold value produces a pair of false probability rate and 
detection probability, normalizing the thresholding values to the range of [0,1] and 
varying the value used for thresholding from 0 to 1 results in a 3-D ROC curve where the 
three coordinates are specified by detection probability, false alarm probability and 
abundance fractions. With this interpretation a 3-D ROC analysis can be conducted. In 
particular, three 2-D curves can be further derived from a 3-D ROC curve and used as 
additional evaluation tools. One is the standard 2-D ROC curve, which is a plot of 
detection probability versus false alarm probability. Another is a 2-D curve plotted based 
on detection probability against abundance fractions. The third one is a 2-D curve, which 
shows a plot of false alarm probability versus abundance fractions. A combination of 
these three 2-D curves describes detailed interaction among the three parameters, detection 
probability, false alarm probability and abundance fractions. 
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18.5 SENSITIVITY ISSUES 

Another unique and important feature in this book is investigation of sensitivity of 
subpixel detection and mixed pixel classification to level of target knowledge and noise. 

18.5.1 Sensitivity to Level of Target Information 

In many practical applications, the target knowledge is generally not available and 
must be obtained directly from the image data. Such generated target knowledge is 
referred to as a posteriori target information and the level of target information is 
determined by how closely a posteriori target information approximates the true target 
knowledge. However, since the true target knowledge is usually unknown, the techniques 
designed for subpixel detection and mixed pixel classification must perform robustly and 
insensitive to the used target information. There are three cases considered for level of 
target information. One is the most general case that a posteriori target information is 
not precise or accurate where the target information is either estimated or contaminated by 
unknown signal sources such as natural background. Another case is that the level of 
target information is provided by partial knowledge a priori, not complete prior 
knowledge, A third case is that the level of target information is deteriorated by 
interference even the target information is accurate. In this book, numerous experiments 
are conducted in Chapter 7 to simulate these three cases to produce various scenarios for 
performance evaluation. 

18.5.2 Sensitivity to Noise 

Noise sensitivity generally occurs in spectral filters that use the sample 
covariance/correlation matrix to suppress image background. Since the sample pixel 
vectors in image data used to form the sample covariance/correlation matrix are real 
image pixel vectors, noise effects must be factored into filter design. A general approach 
is to estimate noise covariance matrix, and then pre-white the noise prior to target 
detection. However, finding a reliable estimate of the noise covariance matrix poses a 
challenging problem. Another approach is to estimate the true rank of the sample 
covariance/correlation matrix, and then use the singular value decomposition to eliminate 
the noise subspace from the original data space. In this case, how to reliably estimate the 
true rank is the key to success. This is closely related to the problem of determination of 
the data intrinsic dimensionality. In this book, both issues are addressed in Chapter 7 
and Chapter 15. A further study on determination of how many spectrally distinct 
signatures is also provided in Chapter 17. 



18.6 REAL-TIME IMPLEMENTATION 

Recent advances in computer technology have greatly improved our ability to 
display and process enormous amounts of digital data. These advances combined with 
recent advances in remote sensing sensor technology can now expand hyperspectral data 
applications. The need of real-time processing algorithms becomes increasingly 
imperative in applications of law enforcement, military operations, environmental 
monitoring, disaster and damage control, etc., where real-time processing becomes cmcial 
to resolving critical situations. It is also particularly useful and effective to detect and 
locate moving targets such as moving vehicles, chemical vapors and plumes. So, another 
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important unique feature in this book is to demonstrate feasibility of real-time 
implementation of various algorithms. Technically speaking, most of the algorithms 
presented in this book can be implemented on line or in real time or near real time. Not 
only can subpixel detection be implemented in real time in Section 4.5 of Chapter 4 
(CEM, LCMV) and Chapter 6 (RXD), but also can mixed pixel classification be carried 
out in a real-time processing as shown in Section 10.5 of Chapter 10 (FCLS), Section 
11.5 of Chapter 11 (LCMV) and Section 14.5 of Chapter 14 (RXD-LCMV anomaly 
classification). 



18.7 FURTHER TECHNIQUES 

In the past years we have witnessed a surge of fast development in algorithm design 
for hyperspectral imaging. Due to limited space the techniques presented in this book are 
mainly focused on linear unmixing methods for target detection and classification. We 
have not been able to address many interesting but yet important approaches to 
hyperspectral imaging, which were also developed in the Remote Sensing Signal and 
Image Processing Laboratory at UMBC. These include generalized orthogonal subspace 
projection approach (Ren and Chang, 2000); convex cone analysis (Ifarraguerri and 
Chang, 1999); projection pursuit (Ifarraguerri and Chang, 1999); (interference-annihilated 
eigen-analysis (Chang and Du. 1999), Kalman filtering-based linear unmixing (Chang 
and Brumbley 1999a, 1999b); band selection (Chang et al. 1999); linear mixture 
analysis-based data compression (Du et al., 2000); radial basis function neural network 
approach (Du and Chang, 1999, Guilfoyle et al., 2001). In what follows, we briefly 
describe each of these approaches and provide readers with a glimpse of their ideas. 

18.7.1 Generalized Orthogonal Subspace Projection 

The orthogonal subspace projection (OSP) has been shown to be a versatile 
technique and have also been successfully applied to a wide range of applications in 
hyperspectral image analysis. In order for OSP to be effective, the number of spectral 
bands must be greater than or equal to that of signatures to be detected or classified. This 
ensures that there are sufficient dimensions for orthogonal projection so that the target 
signatures resulting from OSP can be accommodated separately. In other words, different 
target signatures should be projected onto separate dimensions for detection and 
classification. Such inherent constraint is not an issue for hyperspectral images since they 
generally have hundreds of spectral bands, which is usually more than the number of 
signatures resident in images. However, this may not be true for multispectral images 
where the number of signatures to be classified is likely greater than the number of 
spectral bands such as 3-band SPOT (Satellite Pour I'Observation de la Terra) images. 
This phenomenon was first observed in Chang and Brumbley (1999a, 1999b) when OSP 
was applied to 3 -band SPOT data and four signatures were used for classification. It was 
found that OSP performed poorly in discriminating four signatures using 3-band SPOT 
data, particularly target signatures with similar spectra. More precisely, if we want to 
classify target signatures effectively using OSP, each target signature requires a separate 
dimension for orthogonal projection. If there is a dimension used to accommodate two or 
more target signatures, it is impossible to discriminate these signatures using a single 
dimension through orthogonal projection. This constraint, referred to as Band Number 
Constraint (BNC) in Chang and Brumbley (1999a) points out an inherent limitation on 
OSP. In order to mitigate this dilemma, two approaches were proposed (Chang and 
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Bmmbley, 1999a, 1999b). One is to replace OSP by a Kalman filtering-based linear 
unmixing method, which combines a linear unmixing method with a Kalman filter 
(Chang and Bmmbley, 1999a). Since this approach includes an auxiliary state equation 
in its Kalman filter design, it can be considered as an extension or a generalization of 
OSP. Another approach is to reduce the number of signatures to be classified in order to 
satisfy the BNC. But this approach requires that all possible combinations of signatures 
be examined from which a best classification result could be selected for final 
classification. Even in doing so, the results may be still not good (Chang and Bmmbley, 
1999a). As an alternative, Ren and Chang (2000) developed a new approach, called 
generalized OSP (GOSP). Instead of reducing the number of target signatures, GOSP 
includes a band generation process, which creates extra new bands from the original 
spectral bands by nonlinear operations. These new images are generated in a nonlinear 
manner to capture nonlinear correlation between the original bands so that the 
classification performance can be further improved. Combining the original bands with 
these new nonlinearly correlated bands produces a new set of spectral bands so that the 
BNC can be relaxed to make OSP applicable to multispectral image classification. A 
similar idea using such band expansion was also applied to CEM, referred to as 
generalized CEM (GCEM) (Chang et al., 2000). 

18.7.2 Convex Cone Analysis 

Convex cone analysis was first introduced by Mavrovouniotis et al. (1994a, 1994b) 
for classification of biological samples using time-resolved pyrolysis mass spectrometry 
at the US Army, Edgewood Research Development Engineering Center (ERDEC). Since 
then it has progressed to the point where it can produce useful results on many different 
types of multispectral and hyperspectral images, including chemical plumes. It can be 
used to uncover constituents in a chemical multi-component admixture. The idea is to 
model individual component spectral signatures as vertices of a convex cone. The 
objective of the convex cone is to find a smallest possible convex cone that will embrace 
all data samples resulting from mixtures of vertices. In Mavrovouniotis et al. (1994a, 
1994b), Mavrovouniotis et al.'s algorithm was improved by Ifarraguerri and Chang for 
hyperspectral image analysis (Ifarraguerri and Chang, 1999). The evidence provided by 
these experiments shows that the convex cone analysis can be an effective and useful 
technique for detection and identification of chemical vapors, plumes and aerosol clouds. 
For CCA to be successful, several assumptions must be satisfied. One is the assumption 
that the spectral mixture model is valid for the data cube or some transform thereof The 
CCA also requires that there exists significant relative abundance modulation of the 
various spectral signatures present. In other words, if two or more endmembers are 
present in the same relative proportion for every pixel in which they are present, they will 
appear to a convex cone as one endmember with a signature equivalent to the weighted 
sum of the individual signatures, where the weights are the relative abundances. This is 
true even if the total abundance of this endmember subset varies across the image. For 
chemical plumes arising from, e.g. an industrial process, it can be assumed that the 
various gases are well-mixed, and therefore have constant relative concentration. In this 
case, CCA alone will not be sufficient to uncover the individual signatures of the various 
plume constituents. Application of a spectral filter detector such as CEM is not practical 
here because the number of filters needed, combined with the number of pixels in the 
scene will typically make the task computationally prohibitive. However, CCA can be 
combined with analytical spectral analysis techniques, which rely on existing infrared 
spectral libraries to identify the constituents. In such an approach CCA is used as a way 




344 



HYPERSPECTRAL IMAGING 



to obtain a "clean" spectrum where the influence of the background clutter has been 
sharply reduced. This clean spectrum is then automatically analyzed to determine the 
most likely set of chemical species in the plume. Once the plume constituents have been 
identified, their relative column contents (concentration-pathlength product) can be 
obtained by unmixing, using their absorption coefficient spectra. Additionally, if the 
plume temperature is known or can be estimated, the absolute column contents can be 
obtained. 

18.7.3 Kalman Filter-Based Linear Unmixing 

Kalman filtering has been shown to be a very powerful technique in signal 
processing and control applications. It implements two linear equations, referred to as 
measurement (output) equation and state equation with the former describing the system 
output and the latter characterizing the state of the system structure. There are several 
advantages resulting from the Kalman filtering approach. One is that the filter can be 
implemented recursively. It updates the equations only based on its immediate past not 
its entire past history as is done in the Wiener filtering. Since the filter updates its 
information at every time instant, it can be applied to time-varying systems and 
nonstationary environments. Another is that due to its recursive structure it can be also 
implemented in real time. In order to take advantage of the success of the Kalman 
filtering in signal estimation, Chang and Brumbley (1999a, 1999b) recently investigated 
an approach, called Kalman filter-based linear unmixing (KFLU), which substituted the 
linear spectral mixture model specified by (3.1) for the measurement equation in a 
Kalman filter with the state equation characterized by a Gaussian-Markov process. As is 
often the case, the remotely sensed imagery is generally affected by many unknown 
signal sources, a realistic assumption is that the imagery is nonstationary. In this case, 
the state equation included in KFLU can capture the spectral variability from one pixel to 
another more effectively. Most of linear unmixing methods such as OSP are spectral 
processing techniques and developed on a single pixel basis. They discard the spatial 
correlation among pixels. The KFLU makes use of a state equation to keep track of inter- 
pixel correlation to improve abundance estimation derived from the measurement 
equation, that is the linear spectral mixture model. Although the LCMV and RX detector 
also utilize the sample covariance/correlation matrix, they only capture spectral 
correlation among sample vectors and do not account for inter-pixel spatial correlation as 
does KFLU in the state equation. So, they can be considered as special cases of KFLU. 
Details of the KFLU treatment can be found in Chang and Brumbley (1999a, 1999b). 
Recently, an unsupervised KFLU was proposed to extend KFLU to the case that the 
target knowledge could be obtained directly from the image data (Wang and Chang, 
2002 ). 

18.7.4 Interference-Annihilated Eigen- Analysis 

The goal of Principal Components Analysis (PC A) is to find principal components 
in accordance with the maximum variance of a data matrix. However, it has been shown 
recently that such variance-based principal components may not adequately represent 
image quality. As a result, a modified PC A approach based on maximization of signal- 
to-noise ratio, called Maximum Noise Fraction (MNF) transformation or Noise-Adjusted 
Principal Components (NAPC) transform was proposed to arrange principal components 
in decreasing order of signal-to-noise ration (SNR) rather than variance. One of major 
disadvantages of this approach is that the noise covariance matrix must be estimated 




CONCLUSION AND FURTHER TECHNIQUES 



345 



accurately from the data. Another is that the factor of interference is not taken into 
account in MNF or NAPC where the interfering effect tends to be more serious than 
noise in hyperspectral images. In order to address these problems, Chang et al. (1998a) 
and Chang and Du (1999) considered the interference as a separate unknown signal source 
and extended the standard signal/noise model to a signal/interference/noise model. As a 
result, more reliable noise estimation can be achieved by eliminating both signals and 
interference. More interestingly, such interference annihilation can even further improve 
PC A performance as shown in Chang and Du (1999). A similar idea was also 
investigated in Thai and Healey (2002) where they considered background as a separate 
signal source. If we interpret the background sources as part of interference, their 
signal/background/noise model is reduced to a special case of the signal/interference/noise 
(SIN) model. A more detailed study on comparison between these two models is given 
in Du and Chang (2001). 

The benefit of considering interference separately has been demonstrated in Chang et 
al. (1998a) and Chang and Du (1999). Several advantages can be gained by interference 
annihilation. In hyperspectral images hyperspectral sensors can now extract many 
unknown signal sources, which cannot be resolved by multispectral imagers. Under this 
circumstance, signal-to-interference and noise ratio (SINK) criterion is more appropriate 
than SNR. Alternatively, SNR can be still used as a criterion after interference is 
annihilated. In either case, separating and eliminating these interferers from noise and 
signals can certainly improve traditional SNR-based methods. In the traditional 
signal/noise model, the role of interference is generally overlooked. It is either treated as 
part of the signal or included in noise. This is primarily due to the fact that it is 
generally difficult to obtain the knowledge of interference in practice. However, according 
to experiments conducted for hyperspectral image data, the interference was shown to be 
an important factor in target detection and classification. Accordingly, estimating the 
noise covariance matrix without adequately eliminating interference may lead to an 
inaceurate and biased estimate. In order to find and locate these unknown signal sources, 
an unsupervised vector quantization algorithm was proposed in Chang et al. (1998c) and 
an orthogonal projection-based algorithm was also used in Chang and Du (1999). These 
two algorithms are among the three unsupervised algorithms presented in Chapter 5. As a 
matter of fact, all the three can be used for the same purpose. 

18.7.5 Band Selection 

Hyperspectral sensors can image an area with hundreds of spectral channels at 
different wavelengths for identification of composition of various materials. As a result, 
each image scene represents an image cube with the third dimension specified by spectral 
range. Such 3-D representation creates enormous amounts of data for computer processing 
and data transmission. Band selection for remotely sensed image data is an effective 
means to mitigate the curse of dimensionality. Taking advantage of spectral correlation to 
achieve optimal band selection is one of unique features in multispectral/hyperspectral 
images. Many criteria have been proposed for band selection in the past to find bands 
that are crucial and significant in terms of information conservation (Masusel et al., 
1990; Richards, 1993; Conese and Maselli, 1993). For instance, distance measures 
(Bhattacharyya distance, Jeffrey s-Matusita distance), information theoretic approaches 
(divergence, transformed divergence, mutual information) and eigen-analysis (PCA) have 
been applied to multispectral images for optimal band selection. In particular, the use of 
the divergence measure for band selection has received considerable interest in 
multispectral imagery. More recently, the divergence was also used as a band selection 
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criterion for hyperspectral pixel classification (Steams et al., 1993). However, it requires 
computing divergences for all the possible combinations of band subsets. When the 
divergence measure is calculated for hyperspectral imagery with more than 200 bands, 
such a direct calculation becomes formidable. In order to cope with this problem, an 
alternative approach was reported in (Steams et al., 1993). More recently, a new band 
selection method was developed in Chang et al. (1999), which comprises a band 
prioritization and a band decorrelation. The band prioritization prioritizes all bands in 
accordance with their contained information to be used for classification. Bands are then 
selected on the basis of their associated priorities. Since the band prioritization does not 
consider the spectral correlation, a band decorrelation using the divergence is used to 
decorrelate prioritized bands. 

18.7.6 Linear Mixture Analysis-Based Data Compression 

With significantly improved spatial and spectral resolution hyperspectral imagery 
expands capability of multispectral imagery in many ways, such as subpixel target 
detection, object discrimination, mixed pixel classification, material quantification, etc. 
It also presents new challenges to image analysts, particularly, how to effectively deal 
with its enormous data volume while still achieving their desired goals. One common 
practice is to compress data prior to image analysis. Two types of data compression can 
be performed, lossless and lossy according to redundancy removal. More specifically, 
lossless data compression is generally referred to as data compaction, which eliminates 
unnecessary redundancy without loss of any information. By contrast, lossy data 
compression removes unwanted redundancy or insignificant information, which results 
in entropy reduction. Which compression is appropriate is determined by different 
applications. For example, in medical imaging, lossless compression is preferred to lossy 
compression in order to avoid potential lawsuits against doctors. However, in this case, 
only small compression ratios can be achieved, generally less than 3:1. On the other 
hand, video processing can be benefited from lossy compression, such as HDTV (High 
Definition TV). For remotely sensed imagery, both types of compression can be 
beneficial and have been studied and investigated extensively in the past. If our main 
interest is applications in target detection and image classification, the accuracy in 
detection and classification is generally determined by how we can effectively use target 
features rather than the entire data. As a result, lossless compression does not provide 
additional advantages over lossy compression in the sense of feature extraction. 

Recently, a linear spectral mixture analysis-based orthogonal projection (OP) filter 
was proposed to compress AVIRIS data (Farison et al., 1997). The proposed OP filter 
was based on feature extraction and is the same OSP classifier discussed in Chapter 8. 
This approach directly deals with the abundance fractional images, which may be more 
effective than with the whole stack of images in a hyperspectral image cube. 
Unfortunately, the OSP filter approach in Farison et al. (1997) is an unconstrained linear 
unmixing method and can not accurately estimate the abundance fractional images to 
represent desired features as shown in Chapter 9. In order to overcome this problem, a 
UFCLS-based hyperspectral compression approach was developed in Du et al. (2000) 
where UFCLS has been discussed in Chapter 10. It differs from the approach in Farison 
et al. (1997) in three important aspects. It is a fully constrained LSMA-based 
compression method as opposed to the unconstrained OP filter-based method used in 
Farison et al. (1997). Additionally, compared to the OP filter-based method, the 
UFCLS-based approach is completely unsupervised. Most importantly, it generates a set 
of target fractional abundance images in an unsupervised manner and compresses image 
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data by encoding these target fractional abundance images. Since only targets along with 
their corresponding abundance fractions are required for encoding, a high compression 
ratio can be achieved. More study on this topic is currently under investigation. 

18.7.7 Radial Basis Function Neural Network Approach 

Historically, linear spectral mixture analysis (LSMA) has been widely used to 
describe mixed spectra collected by hyperspectral imagers or laboratory based 
spectrometers. However, on some occasions, a non-linear mixture model may better 
describe the mixture spectra for certain endmember distributions (Guilfoyle et ah, 2001, 
2002). It has been shown by Mustard and Pieters (1989) that the use of a non-linear 
mixture model based on Hapke’s bi-directional spectroscopy theory (1981), referred to as 
intimate mixture model, improves abundance estimates in intimate mixtures of soils. In 
order to evaluate both of these mixing models, a testbed set is developed in which both 
models can be easily applied and verified. A linear mixture model is often used to 
describe those situations in which the endmember components are distributed in block- 
like areas within the field-of-view of the instrument. These situations may occur when 
the spectrometer field-of-view passes over discrete regions such as fields, lakes, rivers and 
forests. In these cases, one would expect the resultant reflectance spectrum to be a linear 
combination of endmembers present in the region, for example, a river water component 
and perhaps a soil component (for the river bank). The linear model assumes that source 
radiation is singly reflected from the endmember substances and then collected by the 
imaging spectrometer. 

Non-linear mixtures occur in situations where endmember components are randomly 
distributed throughout the field-of-view of the instrument. Such situations may occur, for 
example, when viewing striated soils, in areas where multiple rock types are all visible 
on the region’s surface, or in identifying trees in a forest (assuming reflectance spectra 
differences exist). In these cases, the resultant mixture reflectance spectrum may best be 
described by assuming that source radiation is multiply scattered by the randomly 
distributed endmembers before being collected by the imaging spectrometer. Recently 
Guilfoyle, et al. (2001, 2002) conducted experiments in such a manner that linear and 
non-linear mixtures have been created with colored sand according to the above 
descriptions. The resultant mixed spectra are analyzed by a radial basis function neural 
network (RBFNN) developed by Leung and Lo (1995). The results show that the non- 
linear model does match the randomly distributed mixture spectra, while the linear model 
better matches the discrete region mixture (Mustard and Pieters, 1989). 



18.8 APPLICATIONS TO MAGNETIC RESONANCE IMAGING 

As a final section of this chapter, we conclude this book by a discussion of 
applications of hyperspectral imaging techniques to magnetic resonance imaging (MRI). 
Nuclear magnetic resonance (NMR) has recently developed as a versatile technique in 
many fields such as chemistry, physics, engineering because its signals provide rich 
information about material structures that involve the nature of a population of atoms, 
the structure of their environment, and the way in which the atoms interact with 
environment. When NMR is applied to human anatomy, NMR signals can be used to 
measure the nuclear spin density, the interactions of the nuclei with their surrounding 
molecular environment and those between close nuclei, respectively. It produces a 
sequence of multiple spectral images of tissues with a variety of contrasts using three 
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magnetic resonance (MR) parameters, spin-lattice (Tl), spin-spin (T2) and dual echo-echo 
proton density (PD). By appropriately choosing pulse sequence parameters, echo time 
(TE) and repetition time (TR) a sequence of images of specific anatomic area can be 
generated by pixel intensities that represent characteristics of different types of tissues 
throughout the sequence. As a result, magnetic resonance imaging (MRI) becomes a more 
useful image modality than X-ray computerized tomography (X-CT) when it comes to 
analysis of soft tissues and organs since the information about Tl and T2 offers a more 
precise picture of tissue functionality than that produced by X-CT. Interestingly, such 
obtained MR images can be viewed as image sequences acquired by different pulse 
sequence parameters remotely. So, an MR image pixel can be considered as a 
multispectral image pixel vector, of which each component represents an image pixel 
acquired by a particular band that is specified by a pulse sequence parameter. With this 
interpretation Harsanyi and Chang applied the eigen-image technique developed for MRI 
(Miller et al., 1992) to derive their OSP technique (Harsanyi and Chang, 1994) for 
hyperspectral image classification. Most recently, Du and Chang (2000) further extended 
an MR image classification technique developed by Soltanian-Zadeh et al. (1996) to 
LCDA for constrained mixed pixel classification. Despite great success in applying MRI 
to hyperspectral imaging, the opposite direction seems not explored and yet to be 
investigated. In a series of recent papers by Wang et al. (Wang, et al., 2002a, 2002b, 
2002c) and Wang’s Ph.D. dissertation (Wang, 2002) applications of hyperspectral 
imaging techniques to MRI have been studied. They first applied the OSP discussed in 
Chapters 3 and 8 to MR image classification (Wang et al., 2002a) and later extended it to 
an unsupervised (UOSP) (Wang et al., 2002b) where the DTDCA described in Section 
13.3 of Chapter 13 was used as an unsupervised algorithm to generate unknown objects 
in MR images. More details about applications of OSP and UOSP can be found in Wang 
(2002). In order to minimize effects caused by unknown brain tissues in the image 
background, Wang et al. (2003) further applied CEM to detection of spectral features in 
MR images and has shown that CEM could be a promising technique for MRI when the 
knowledge of MR image background is not available. Specifically, as shown in Wang et 
al. (2002c) the 3-D ROC analysis proposed in Section 15.5 of Chapter 15 provided an 
effective tool for performance evaluation. The work in Wang et al. (2002a, 2002b, 2002c) 
demonstrates that the potential of hyperspectral imaging techniques presented in this 
book is not necessarily limited to applications of remote sensing image processing. Other 
applications such as MRI can be also benefited from these techniques as well. 
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